How do I install Z-Image in ComfyUI?

Download the model files from Hugging Face (Comfy-Org/z_image_turbo) and place them in the correct folders: text encoder in models/text_encoders, diffusion model in models/diffusion_models, and VAE in models/vae.

What are Z-Image's custom ComfyUI nodes?

Z-Image provides ZImageTextEncoder (main encoder with template support), ZImageTurnBuilder (for multi-turn conversations), and ZImageTextEncoderSimple (for negative prompts). These replace ComfyUI's stock text encoder for Z-Image.

What's the difference between ZImageTextEncoder and stock encoder?

ZImageTextEncoder formats prompts using Qwen3-4B's chat template with system/user/assistant roles and special tokens. Stock encoder sends plain text. ZImageTextEncoder gives you control over system prompts, think blocks, and templates.

How do I use think blocks in Z-Image ComfyUI?

Enable add_think_block in ZImageTextEncoder, then fill thinking_content with reasoning about the image. This text goes inside tags in the prompt and can guide the generation.

What templates are available for Z-Image?

Z-Image includes 140+ templates covering styles like photorealistic, comic_american, anime_ghibli, neon_cyberpunk, oil_painting_classical, pixel_art, and structured formats like json_structured and yaml_structured.

Z-Image ComfyUI Workflow: Complete Custom Nodes Guide | Z Image Tutorial

Z-Image Turbo works with any ComfyUI setup, but its custom nodes unlock the full power of the Qwen3-4B chat template format. This guide covers everything you need to build advanced workflows.

Installation

Required Files

Download from Comfy-Org/z_image_turbo:

ComfyUI/
├── models/
│   ├── text_encoders/
│   │   └── qwen_3_4b.safetensors
│   ├── diffusion_models/
│   │   └── z_image_turbo_bf16.safetensors
│   └── vae/
│       └── ae.safetensors  (Flux 1 VAE)
└── custom_nodes/
    └── comfyui-z-image/  (Custom nodes package)

Basic Settings

Setting	Value
Steps	8-10 (recommend 9)
CFG	1.0-2.0 (recommend 1.0)
Sampler	DPM++ 2M Karras
CLIP Type	Lumina 2

Node Reference

ZImageTextEncoder

The main encoder node. Handles complete first-turn conversations.

Inputs

Input	Type	Description
clip	CLIP	From CLIPLoader (Lumina 2)
user_prompt	STRING	Your generation request (required)
template_preset	DROPDOWN	Quick style selection (140+ options)
system_prompt	STRING	Custom instructions (auto-filled by template)
add_think_block	BOOLEAN	Add `<think>` tags to template
thinking_content	STRING	Text inside think tags
assistant_content	STRING	Text after think block
raw_prompt	STRING	Bypass everything, write own tokens
strip_key_quotes	BOOLEAN	Remove JSON quotes from LLM output

Outputs

Output	Type	Description
conditioning	CONDITIONING	Connect to KSampler positive
formatted_prompt	STRING	See exactly what was encoded
conversation	CONVERSATION	Chain to TurnBuilder

Example Configuration

template_preset: photorealistic
user_prompt: "A 30-year-old Japanese woman in business attire"
add_think_block: true
thinking_content: "Professional headshot, soft studio lighting, confident expression"
assistant_content: "Here's your professional portrait."

ZImageTurnBuilder

Adds conversation turns after the initial encoder.

Inputs

Input	Type	Description
previous	CONVERSATION	From encoder or another TurnBuilder
clip	CLIP (optional)	Connect to output conditioning directly
user_prompt	STRING	Next user message
thinking_content	STRING	Text for think block
assistant_content	STRING	Assistant response
is_final	BOOLEAN	Leave last message open

Two Workflow Options

Option A: Direct Encoding (clip connected)

ZImageTextEncoder → TurnBuilder (clip) → KSampler

TurnBuilder outputs conditioning directly when clip is connected.

Option B: Chain Back (no clip)

ZImageTextEncoder → TurnBuilder → ZImageTextEncoder (conversation_override)

Use conversation_override input on second encoder.

ZImageTextEncoderSimple

Simplified encoder for quick use — ideal for negative prompts.

Inputs

Same as ZImageTextEncoder but without conversation output.

Use Case

Positive: ZImageTextEncoder → KSampler (positive)
Negative: ZImageTextEncoderSimple → KSampler (negative)
          user_prompt: "bad anatomy, blurry, watermark"

The Chat Template

Structure

Z-Image expects this format:

<|im_start|>system
[System instructions]<|im_end|>
<|im_start|>user
[User prompt]<|im_end|>
<|im_start|>assistant
<think>
[Reasoning/planning]
</think>

[Response]<|im_end|>

Special Tokens

Token	Purpose
`<\|im_start\|>`	Start of a message
`<\|im_end\|>`	End of a message
`system`	Role: instructions
`user`	Role: human request
`assistant`	Role: model response
`<think>`	Start reasoning
`</think>`	End reasoning

What Each Part Does

System Prompt

Sets style and constraints
Persists across conversation
Example: "Generate photorealistic images with natural lighting"

User Prompt

Your actual request
The main content
Example: "A cat sleeping on a windowsill"

Think Block

Reasoning about the image
You write this, not an LLM
Example: "Warm afternoon light, shallow depth of field"

Assistant Content

Response text before generation
Can prime the model
Example: "Here's a cozy scene with soft lighting"

Template System

Available Templates (140+)

Style Templates

Template	Description
photorealistic	Natural lighting, realistic details
comic_american	Bold outlines, flat colors, dynamic
anime_ghibli	Studio Ghibli watercolor style
anime_shinkai	Makoto Shinkai dramatic lighting
neon_cyberpunk	Neon lights, rain-slicked streets
oil_painting_classical	Renaissance master technique
pixel_art	Retro 8/16-bit aesthetic
character_design	Turnaround sheets, model references
watercolor_soft	Soft watercolor painting
noir_cinematic	High contrast black and white

Structured Prompt Templates

Template	Description
json_structured	Parse JSON-formatted prompts
yaml_structured	Parse YAML hierarchical prompts
markdown_structured	Parse Markdown-formatted prompts

Extended Template Format

Templates can pre-fill multiple fields:

# Template file structure
system_prompt: "Your style instructions here"
add_think_block: true
thinking_content: "Pre-filled reasoning"
assistant_content: "Pre-filled response"

When selected, all configured fields auto-populate.

Custom Templates

Create custom templates in:

custom_nodes/comfyui-z-image/nodes/templates/z_image/

Template file format (YAML):

name: my_custom_style
system_prompt: |
  Generate images in my custom style.
  Focus on specific aesthetic elements.
add_think_block: true
thinking_content: |
  Default reasoning for this style.

Workflow Examples

Basic Single Image

[CLIPLoader]          [Load VAE]
    ↓                      ↓
[ZImageTextEncoder]   [Load Diffusion Model]
    ↓                      ↓
    └──────→ [KSampler] ←──┘
                 ↓
            [VAEDecode]
                 ↓
            [SaveImage]

Settings:

CLIPLoader: type = Lumina 2
ZImageTextEncoder: template_preset = photorealistic
KSampler: steps = 9, cfg = 1.0

With Think Block

[ZImageTextEncoder]
├── template_preset: photorealistic
├── user_prompt: "Portrait of an elderly fisherman"
├── add_think_block: ✓
├── thinking_content: "Weathered face, sea salt in beard,
│                      warm golden hour light, authenticity"
└── assistant_content: "A life shaped by the sea."
         ↓
    [KSampler]

Multi-Turn Character

[ZImageTextEncoder] ←─────────────────────────────────────┐
├── system_prompt: "Generate consistent character..."     │
├── user_prompt: [Full character sheet]                   │
├── thinking_content: "Key features: blue eyes, scar..."  │
└── assistant_content: "Character established."           │
         ↓                                                │
         ↓ conversation                                   │
         ↓                                                │
[ZImageTurnBuilder]                                       │
├── previous: [conversation from above]                   │
├── user_prompt: "Change outfit to red dress"             │
├── thinking_content: "Preserve face, change only outfit" │
├── clip: [from CLIPLoader]                               │
└── is_final: ✓                                           │
         ↓                                                │
         ↓ conditioning                                   │
         ↓                                                │
    [KSampler]                                            │
         ↓                                                │
    [SaveImage]                                           │

Positive + Negative

[CLIPLoader] ─────────────────────────┐
      ↓                               ↓
[ZImageTextEncoder]    [ZImageTextEncoderSimple]
(positive prompt)      (negative prompt)
      ↓                               ↓
      └────────→ [KSampler] ←─────────┘
                     ↓
                [VAEDecode]

Negative prompt example:

user_prompt: "bad anatomy, extra limbs, blurry, watermark, text"

Raw Mode

For complete control, use raw_prompt to bypass all formatting:

raw_prompt: |
  <|im_start|>system
  You are a surrealist painter specializing in dreamscapes.<|im_end|>
  <|im_start|>user
  A melting clock draped over a tree branch in a desert<|im_end|>
  <|im_start|>assistant
  <think>
  Dali-esque composition, impossible physics, soft warm desert light,
  meticulous detail on the clock face, vast empty background
  </think>

  A dreamscape emerges from the unconscious mind.

When raw_prompt is set, all other fields are ignored. You're responsible for correct token formatting.

Debugging

Check Formatted Output

Connect formatted_prompt to a Preview Text node to see exactly what's being encoded.

Console Logs

Server console shows:

[Z-Image] Formatted prompt: <|im_start|>system...
[Z-Image] Mode: direct
[Z-Image] Character counts: system=45, user=234, think=67
[Z-Image] Token estimate: ~350

Common Issues

Issue	Cause	Solution
Blank output	Wrong CLIP type	Use Lumina 2
Poor quality	CFG too high	Set CFG to 1.0
Slow generation	Too many steps	Use 8-10 steps
Template not working	Wrong file path	Check templates folder

Performance Optimization

VRAM Management

Precision	VRAM	Quality
bf16	~16GB	Best
fp8	~10GB	Good
GGUF Q4	~6GB	Acceptable

Speed Settings

Priority	Steps	Sampler
Speed	8	UniPC
Balanced	9	DPM++ 2M Karras
Quality	12	DPM++ SDE Karras

Batch Processing

For batch generation:

Use deterministic sampler (DPM++ 2M Karras)
Lock seed for reproducibility
Keep CFG at 1.0

Integration Tips

With ControlNet

Z-Image supports standard ControlNet workflows:

[ZImageTextEncoder] + [ControlNet Apply] → [KSampler]

With LoRA

LoRA support is growing. Apply before text encoding:

[LoRA Loader] → [ZImageTextEncoder] → [KSampler]

With IPAdapter

For style transfer:

[IPAdapter] + [ZImageTextEncoder] → [KSampler]

Quick Start Checklist

Download model files from Hugging Face
Place in correct ComfyUI folders
Install custom nodes package
Set CLIP type to Lumina 2
Set steps to 9, CFG to 1.0
Test with simple prompt
Enable think block for more control
Try different templates

Resources

Try Z-Image online at z-image.vip — no installation required.

Keep Reading

Character Consistency Guide — Multi-turn workflow details
Best Sampler Guide — Sampler selection
Low VRAM Guide — Run on budget GPUs