Z-Image ComfyUI Workflow: Complete Custom Nodes Guide | Z Image Tutorial
Master Z-Image (Z Image) custom ComfyUI nodes. Complete reference for ZImageTextEncoder, ZImageTurnBuilder, templates, think blocks, and advanced Z-Image multi-turn workflows.

Z-Image Turbo works with any ComfyUI setup, but its custom nodes unlock the full power of the Qwen3-4B chat template format. This guide covers everything you need to build advanced workflows.
Installation
Required Files
Download from Comfy-Org/z_image_turbo:
ComfyUI/
├── models/
│ ├── text_encoders/
│ │ └── qwen_3_4b.safetensors
│ ├── diffusion_models/
│ │ └── z_image_turbo_bf16.safetensors
│ └── vae/
│ └── ae.safetensors (Flux 1 VAE)
└── custom_nodes/
└── comfyui-z-image/ (Custom nodes package)
Basic Settings
| Setting | Value |
|---|---|
| Steps | 8-10 (recommend 9) |
| CFG | 1.0-2.0 (recommend 1.0) |
| Sampler | DPM++ 2M Karras |
| CLIP Type | Lumina 2 |
Node Reference
ZImageTextEncoder
The main encoder node. Handles complete first-turn conversations.
Inputs
| Input | Type | Description |
|---|---|---|
| clip | CLIP | From CLIPLoader (Lumina 2) |
| user_prompt | STRING | Your generation request (required) |
| template_preset | DROPDOWN | Quick style selection (140+ options) |
| system_prompt | STRING | Custom instructions (auto-filled by template) |
| add_think_block | BOOLEAN | Add <think> tags to template |
| thinking_content | STRING | Text inside think tags |
| assistant_content | STRING | Text after think block |
| raw_prompt | STRING | Bypass everything, write own tokens |
| strip_key_quotes | BOOLEAN | Remove JSON quotes from LLM output |
Outputs
| Output | Type | Description |
|---|---|---|
| conditioning | CONDITIONING | Connect to KSampler positive |
| formatted_prompt | STRING | See exactly what was encoded |
| conversation | CONVERSATION | Chain to TurnBuilder |
Example Configuration
template_preset: photorealistic
user_prompt: "A 30-year-old Japanese woman in business attire"
add_think_block: true
thinking_content: "Professional headshot, soft studio lighting, confident expression"
assistant_content: "Here's your professional portrait."
ZImageTurnBuilder
Adds conversation turns after the initial encoder.
Inputs
| Input | Type | Description |
|---|---|---|
| previous | CONVERSATION | From encoder or another TurnBuilder |
| clip | CLIP (optional) | Connect to output conditioning directly |
| user_prompt | STRING | Next user message |
| thinking_content | STRING | Text for think block |
| assistant_content | STRING | Assistant response |
| is_final | BOOLEAN | Leave last message open |
Two Workflow Options
Option A: Direct Encoding (clip connected)
ZImageTextEncoder → TurnBuilder (clip) → KSampler
TurnBuilder outputs conditioning directly when clip is connected.
Option B: Chain Back (no clip)
ZImageTextEncoder → TurnBuilder → ZImageTextEncoder (conversation_override)
Use conversation_override input on second encoder.
ZImageTextEncoderSimple
Simplified encoder for quick use — ideal for negative prompts.
Inputs
Same as ZImageTextEncoder but without conversation output.
Use Case
Positive: ZImageTextEncoder → KSampler (positive)
Negative: ZImageTextEncoderSimple → KSampler (negative)
user_prompt: "bad anatomy, blurry, watermark"
The Chat Template
Structure
Z-Image expects this format:
<|im_start|>system
[System instructions]<|im_end|>
<|im_start|>user
[User prompt]<|im_end|>
<|im_start|>assistant
<think>
[Reasoning/planning]
</think>
[Response]<|im_end|>
Special Tokens
| Token | Purpose |
|---|---|
<|im_start|> | Start of a message |
<|im_end|> | End of a message |
system | Role: instructions |
user | Role: human request |
assistant | Role: model response |
<think> | Start reasoning |
</think> | End reasoning |
What Each Part Does
System Prompt
- Sets style and constraints
- Persists across conversation
- Example: "Generate photorealistic images with natural lighting"
User Prompt
- Your actual request
- The main content
- Example: "A cat sleeping on a windowsill"
Think Block
- Reasoning about the image
- You write this, not an LLM
- Example: "Warm afternoon light, shallow depth of field"
Assistant Content
- Response text before generation
- Can prime the model
- Example: "Here's a cozy scene with soft lighting"
Template System
Available Templates (140+)
Style Templates
| Template | Description |
|---|---|
| photorealistic | Natural lighting, realistic details |
| comic_american | Bold outlines, flat colors, dynamic |
| anime_ghibli | Studio Ghibli watercolor style |
| anime_shinkai | Makoto Shinkai dramatic lighting |
| neon_cyberpunk | Neon lights, rain-slicked streets |
| oil_painting_classical | Renaissance master technique |
| pixel_art | Retro 8/16-bit aesthetic |
| character_design | Turnaround sheets, model references |
| watercolor_soft | Soft watercolor painting |
| noir_cinematic | High contrast black and white |
Structured Prompt Templates
| Template | Description |
|---|---|
| json_structured | Parse JSON-formatted prompts |
| yaml_structured | Parse YAML hierarchical prompts |
| markdown_structured | Parse Markdown-formatted prompts |
Extended Template Format
Templates can pre-fill multiple fields:
# Template file structure
system_prompt: "Your style instructions here"
add_think_block: true
thinking_content: "Pre-filled reasoning"
assistant_content: "Pre-filled response"
When selected, all configured fields auto-populate.
Custom Templates
Create custom templates in:
custom_nodes/comfyui-z-image/nodes/templates/z_image/
Template file format (YAML):
name: my_custom_style
system_prompt: |
Generate images in my custom style.
Focus on specific aesthetic elements.
add_think_block: true
thinking_content: |
Default reasoning for this style.
Workflow Examples
Basic Single Image
[CLIPLoader] [Load VAE]
↓ ↓
[ZImageTextEncoder] [Load Diffusion Model]
↓ ↓
└──────→ [KSampler] ←──┘
↓
[VAEDecode]
↓
[SaveImage]
Settings:
- CLIPLoader: type = Lumina 2
- ZImageTextEncoder: template_preset = photorealistic
- KSampler: steps = 9, cfg = 1.0
With Think Block
[ZImageTextEncoder]
├── template_preset: photorealistic
├── user_prompt: "Portrait of an elderly fisherman"
├── add_think_block: ✓
├── thinking_content: "Weathered face, sea salt in beard,
│ warm golden hour light, authenticity"
└── assistant_content: "A life shaped by the sea."
↓
[KSampler]
Multi-Turn Character
[ZImageTextEncoder] ←─────────────────────────────────────┐
├── system_prompt: "Generate consistent character..." │
├── user_prompt: [Full character sheet] │
├── thinking_content: "Key features: blue eyes, scar..." │
└── assistant_content: "Character established." │
↓ │
↓ conversation │
↓ │
[ZImageTurnBuilder] │
├── previous: [conversation from above] │
├── user_prompt: "Change outfit to red dress" │
├── thinking_content: "Preserve face, change only outfit" │
├── clip: [from CLIPLoader] │
└── is_final: ✓ │
↓ │
↓ conditioning │
↓ │
[KSampler] │
↓ │
[SaveImage] │
Positive + Negative
[CLIPLoader] ─────────────────────────┐
↓ ↓
[ZImageTextEncoder] [ZImageTextEncoderSimple]
(positive prompt) (negative prompt)
↓ ↓
└────────→ [KSampler] ←─────────┘
↓
[VAEDecode]
Negative prompt example:
user_prompt: "bad anatomy, extra limbs, blurry, watermark, text"
Raw Mode
For complete control, use raw_prompt to bypass all formatting:
raw_prompt: |
<|im_start|>system
You are a surrealist painter specializing in dreamscapes.<|im_end|>
<|im_start|>user
A melting clock draped over a tree branch in a desert<|im_end|>
<|im_start|>assistant
<think>
Dali-esque composition, impossible physics, soft warm desert light,
meticulous detail on the clock face, vast empty background
</think>
A dreamscape emerges from the unconscious mind.
When raw_prompt is set, all other fields are ignored. You're responsible for correct token formatting.
Debugging
Check Formatted Output
Connect formatted_prompt to a Preview Text node to see exactly what's being encoded.
Console Logs
Server console shows:
[Z-Image] Formatted prompt: <|im_start|>system...
[Z-Image] Mode: direct
[Z-Image] Character counts: system=45, user=234, think=67
[Z-Image] Token estimate: ~350
Common Issues
| Issue | Cause | Solution |
|---|---|---|
| Blank output | Wrong CLIP type | Use Lumina 2 |
| Poor quality | CFG too high | Set CFG to 1.0 |
| Slow generation | Too many steps | Use 8-10 steps |
| Template not working | Wrong file path | Check templates folder |
Performance Optimization
VRAM Management
| Precision | VRAM | Quality |
|---|---|---|
| bf16 | ~16GB | Best |
| fp8 | ~10GB | Good |
| GGUF Q4 | ~6GB | Acceptable |
Speed Settings
| Priority | Steps | Sampler |
|---|---|---|
| Speed | 8 | UniPC |
| Balanced | 9 | DPM++ 2M Karras |
| Quality | 12 | DPM++ SDE Karras |
Batch Processing
For batch generation:
- Use deterministic sampler (DPM++ 2M Karras)
- Lock seed for reproducibility
- Keep CFG at 1.0
Integration Tips
With ControlNet
Z-Image supports standard ControlNet workflows:
[ZImageTextEncoder] + [ControlNet Apply] → [KSampler]
With LoRA
LoRA support is growing. Apply before text encoding:
[LoRA Loader] → [ZImageTextEncoder] → [KSampler]
With IPAdapter
For style transfer:
[IPAdapter] + [ZImageTextEncoder] → [KSampler]
Quick Start Checklist
- Download model files from Hugging Face
- Place in correct ComfyUI folders
- Install custom nodes package
- Set CLIP type to Lumina 2
- Set steps to 9, CFG to 1.0
- Test with simple prompt
- Enable think block for more control
- Try different templates
Resources
Try Z-Image online at z-image.vip — no installation required.
Keep Reading
- Character Consistency Guide — Multi-turn workflow details
- Best Sampler Guide — Sampler selection
- Low VRAM Guide — Run on budget GPUs