Free | 150 credits

Z-Image ComfyUI Workflow: Complete Custom Nodes Guide | Z Image Tutorial

Master Z-Image (Z Image) custom ComfyUI nodes. Complete reference for ZImageTextEncoder, ZImageTurnBuilder, templates, think blocks, and advanced Z-Image multi-turn workflows.

Z-Image TeamReddit··8 min read
Z-Image ComfyUI Workflow: Complete Custom Nodes Guide | Z Image Tutorial

Z-Image Turbo works with any ComfyUI setup, but its custom nodes unlock the full power of the Qwen3-4B chat template format. This guide covers everything you need to build advanced workflows.

Installation

Required Files

Download from Comfy-Org/z_image_turbo:

ComfyUI/
├── models/
│   ├── text_encoders/
│   │   └── qwen_3_4b.safetensors
│   ├── diffusion_models/
│   │   └── z_image_turbo_bf16.safetensors
│   └── vae/
│       └── ae.safetensors  (Flux 1 VAE)
└── custom_nodes/
    └── comfyui-z-image/  (Custom nodes package)

Basic Settings

SettingValue
Steps8-10 (recommend 9)
CFG1.0-2.0 (recommend 1.0)
SamplerDPM++ 2M Karras
CLIP TypeLumina 2

Node Reference

ZImageTextEncoder

The main encoder node. Handles complete first-turn conversations.

Inputs

InputTypeDescription
clipCLIPFrom CLIPLoader (Lumina 2)
user_promptSTRINGYour generation request (required)
template_presetDROPDOWNQuick style selection (140+ options)
system_promptSTRINGCustom instructions (auto-filled by template)
add_think_blockBOOLEANAdd <think> tags to template
thinking_contentSTRINGText inside think tags
assistant_contentSTRINGText after think block
raw_promptSTRINGBypass everything, write own tokens
strip_key_quotesBOOLEANRemove JSON quotes from LLM output

Outputs

OutputTypeDescription
conditioningCONDITIONINGConnect to KSampler positive
formatted_promptSTRINGSee exactly what was encoded
conversationCONVERSATIONChain to TurnBuilder

Example Configuration

template_preset: photorealistic
user_prompt: "A 30-year-old Japanese woman in business attire"
add_think_block: true
thinking_content: "Professional headshot, soft studio lighting, confident expression"
assistant_content: "Here's your professional portrait."

ZImageTurnBuilder

Adds conversation turns after the initial encoder.

Inputs

InputTypeDescription
previousCONVERSATIONFrom encoder or another TurnBuilder
clipCLIP (optional)Connect to output conditioning directly
user_promptSTRINGNext user message
thinking_contentSTRINGText for think block
assistant_contentSTRINGAssistant response
is_finalBOOLEANLeave last message open

Two Workflow Options

Option A: Direct Encoding (clip connected)

ZImageTextEncoder → TurnBuilder (clip) → KSampler

TurnBuilder outputs conditioning directly when clip is connected.

Option B: Chain Back (no clip)

ZImageTextEncoder → TurnBuilder → ZImageTextEncoder (conversation_override)

Use conversation_override input on second encoder.

ZImageTextEncoderSimple

Simplified encoder for quick use — ideal for negative prompts.

Inputs

Same as ZImageTextEncoder but without conversation output.

Use Case

Positive: ZImageTextEncoder → KSampler (positive)
Negative: ZImageTextEncoderSimple → KSampler (negative)
          user_prompt: "bad anatomy, blurry, watermark"

The Chat Template

Structure

Z-Image expects this format:

<|im_start|>system
[System instructions]<|im_end|>
<|im_start|>user
[User prompt]<|im_end|>
<|im_start|>assistant
<think>
[Reasoning/planning]
</think>

[Response]<|im_end|>

Special Tokens

TokenPurpose
<|im_start|>Start of a message
<|im_end|>End of a message
systemRole: instructions
userRole: human request
assistantRole: model response
<think>Start reasoning
</think>End reasoning

What Each Part Does

System Prompt

  • Sets style and constraints
  • Persists across conversation
  • Example: "Generate photorealistic images with natural lighting"

User Prompt

  • Your actual request
  • The main content
  • Example: "A cat sleeping on a windowsill"

Think Block

  • Reasoning about the image
  • You write this, not an LLM
  • Example: "Warm afternoon light, shallow depth of field"

Assistant Content

  • Response text before generation
  • Can prime the model
  • Example: "Here's a cozy scene with soft lighting"

Template System

Available Templates (140+)

Style Templates

TemplateDescription
photorealisticNatural lighting, realistic details
comic_americanBold outlines, flat colors, dynamic
anime_ghibliStudio Ghibli watercolor style
anime_shinkaiMakoto Shinkai dramatic lighting
neon_cyberpunkNeon lights, rain-slicked streets
oil_painting_classicalRenaissance master technique
pixel_artRetro 8/16-bit aesthetic
character_designTurnaround sheets, model references
watercolor_softSoft watercolor painting
noir_cinematicHigh contrast black and white

Structured Prompt Templates

TemplateDescription
json_structuredParse JSON-formatted prompts
yaml_structuredParse YAML hierarchical prompts
markdown_structuredParse Markdown-formatted prompts

Extended Template Format

Templates can pre-fill multiple fields:

# Template file structure
system_prompt: "Your style instructions here"
add_think_block: true
thinking_content: "Pre-filled reasoning"
assistant_content: "Pre-filled response"

When selected, all configured fields auto-populate.

Custom Templates

Create custom templates in:

custom_nodes/comfyui-z-image/nodes/templates/z_image/

Template file format (YAML):

name: my_custom_style
system_prompt: |
  Generate images in my custom style.
  Focus on specific aesthetic elements.
add_think_block: true
thinking_content: |
  Default reasoning for this style.

Workflow Examples

Basic Single Image

[CLIPLoader]          [Load VAE]
    ↓                      ↓
[ZImageTextEncoder]   [Load Diffusion Model]
    ↓                      ↓
    └──────→ [KSampler] ←──┘
                 ↓
            [VAEDecode]
                 ↓
            [SaveImage]

Settings:

  • CLIPLoader: type = Lumina 2
  • ZImageTextEncoder: template_preset = photorealistic
  • KSampler: steps = 9, cfg = 1.0

With Think Block

[ZImageTextEncoder]
├── template_preset: photorealistic
├── user_prompt: "Portrait of an elderly fisherman"
├── add_think_block: ✓
├── thinking_content: "Weathered face, sea salt in beard,
│                      warm golden hour light, authenticity"
└── assistant_content: "A life shaped by the sea."
         ↓
    [KSampler]

Multi-Turn Character

[ZImageTextEncoder] ←─────────────────────────────────────┐
├── system_prompt: "Generate consistent character..."     │
├── user_prompt: [Full character sheet]                   │
├── thinking_content: "Key features: blue eyes, scar..."  │
└── assistant_content: "Character established."           │
         ↓                                                │
         ↓ conversation                                   │
         ↓                                                │
[ZImageTurnBuilder]                                       │
├── previous: [conversation from above]                   │
├── user_prompt: "Change outfit to red dress"             │
├── thinking_content: "Preserve face, change only outfit" │
├── clip: [from CLIPLoader]                               │
└── is_final: ✓                                           │
         ↓                                                │
         ↓ conditioning                                   │
         ↓                                                │
    [KSampler]                                            │
         ↓                                                │
    [SaveImage]                                           │

Positive + Negative

[CLIPLoader] ─────────────────────────┐
      ↓                               ↓
[ZImageTextEncoder]    [ZImageTextEncoderSimple]
(positive prompt)      (negative prompt)
      ↓                               ↓
      └────────→ [KSampler] ←─────────┘
                     ↓
                [VAEDecode]

Negative prompt example:

user_prompt: "bad anatomy, extra limbs, blurry, watermark, text"

Raw Mode

For complete control, use raw_prompt to bypass all formatting:

raw_prompt: |
  <|im_start|>system
  You are a surrealist painter specializing in dreamscapes.<|im_end|>
  <|im_start|>user
  A melting clock draped over a tree branch in a desert<|im_end|>
  <|im_start|>assistant
  <think>
  Dali-esque composition, impossible physics, soft warm desert light,
  meticulous detail on the clock face, vast empty background
  </think>

  A dreamscape emerges from the unconscious mind.

When raw_prompt is set, all other fields are ignored. You're responsible for correct token formatting.


Debugging

Check Formatted Output

Connect formatted_prompt to a Preview Text node to see exactly what's being encoded.

Console Logs

Server console shows:

[Z-Image] Formatted prompt: <|im_start|>system...
[Z-Image] Mode: direct
[Z-Image] Character counts: system=45, user=234, think=67
[Z-Image] Token estimate: ~350

Common Issues

IssueCauseSolution
Blank outputWrong CLIP typeUse Lumina 2
Poor qualityCFG too highSet CFG to 1.0
Slow generationToo many stepsUse 8-10 steps
Template not workingWrong file pathCheck templates folder

Performance Optimization

VRAM Management

PrecisionVRAMQuality
bf16~16GBBest
fp8~10GBGood
GGUF Q4~6GBAcceptable

Speed Settings

PriorityStepsSampler
Speed8UniPC
Balanced9DPM++ 2M Karras
Quality12DPM++ SDE Karras

Batch Processing

For batch generation:

  • Use deterministic sampler (DPM++ 2M Karras)
  • Lock seed for reproducibility
  • Keep CFG at 1.0

Integration Tips

With ControlNet

Z-Image supports standard ControlNet workflows:

[ZImageTextEncoder] + [ControlNet Apply] → [KSampler]

With LoRA

LoRA support is growing. Apply before text encoding:

[LoRA Loader] → [ZImageTextEncoder] → [KSampler]

With IPAdapter

For style transfer:

[IPAdapter] + [ZImageTextEncoder] → [KSampler]

Quick Start Checklist

  • Download model files from Hugging Face
  • Place in correct ComfyUI folders
  • Install custom nodes package
  • Set CLIP type to Lumina 2
  • Set steps to 9, CFG to 1.0
  • Test with simple prompt
  • Enable think block for more control
  • Try different templates

Resources


Try Z-Image online at z-image.vip — no installation required.


Keep Reading