Can Z-Image Turbo maintain character consistency?

Yes, Z-Image Turbo can maintain character consistency using its multi-turn conversation format. Define detailed character profiles in the first turn, then make targeted edits in subsequent turns while preserving core features.

What is multi-turn conversation in Z-Image?

Multi-turn conversation is Z-Image's feature that formats prompts as a chat between user and assistant. Each turn builds context, allowing you to define a character then make iterative modifications while maintaining consistency.

How does Z-Image's chat template work?

Z-Image uses Qwen3-4B's chat template with special tokens like , , and blocks. This structured format helps the model understand what to preserve versus what to modify across turns.

Do I need ComfyUI for character consistency?

The multi-turn conversation features work best with Z-Image's custom ComfyUI nodes (ZImageTextEncoder and ZImageTurnBuilder). Basic prompting works in any interface, but advanced control requires ComfyUI.

Can I use thinking blocks in Z-Image prompts?

Yes, Z-Image supports blocks where you can add reasoning about what to change or preserve. This text becomes part of the encoded prompt and can guide the generation.

Z-Image Character Consistency: Multi-Turn Guide | Z Image Tutorial

One of the biggest challenges in AI image generation is maintaining consistency across multiple images. Z-Image Turbo addresses this with its unique multi-turn conversation format.

This guide explains how to define characters once and make precise modifications while preserving their core identity.

Understanding Z-Image's Chat Format

The Qwen3-4B Foundation

Z-Image Turbo uses Qwen3-4B as its text encoder. This model was trained on conversations with a specific structure:

<|im_start|>system
Instructions for the model<|im_end|>
<|im_start|>user
The user's request<|im_end|>
<|im_start|>assistant
<think>
Model's reasoning process
</think>
Response to the user<|im_end|>

When you use Z-Image through its custom ComfyUI nodes, you can access this full structure.

Why This Matters for Consistency

The model was trained to:

Follow system instructions throughout the conversation
Remember context from previous turns
Use <think> blocks for reasoning about changes
Maintain consistency while making requested modifications

By structuring your prompts as multi-turn conversations, you give Z-Image explicit context about what should stay the same.

The Multi-Turn Workflow

Step 1: Define Your Character (First Turn)

Create a comprehensive character profile with every detail you want preserved:

# Character Profile: Sarah Chen

## Core Identity
- Name: Sarah Chen
- Age: 28
- Ethnicity: Chinese-American
- Build: Slim, 5'6"

## Face & Features
- Face Shape: Oval with high cheekbones
- Skin: Fair with warm undertones, light freckles across nose
- Eyes: Dark brown, almond-shaped, slight upturn at corners
- Eyebrows: Natural, slightly thick, well-groomed
- Nose: Small, slightly upturned
- Lips: Full, natural rose color
- Expression: Default confident half-smile

## Hair
- Color: Black with subtle warm brown highlights
- Length: Mid-back
- Style: Usually worn down, slight natural wave
- Texture: Thick, healthy, glossy

## Distinguishing Features
- Small beauty mark below right eye
- Delicate gold hoop earrings (always wears)
- Silver necklace with crescent moon pendant

## Default Attire
- Style: Modern professional, minimalist
- Colors: Often black, white, navy, burgundy
- Preference: Clean lines, quality fabrics

Step 2: First Image Generation

Use the full character sheet in your first prompt:

[System Prompt]
Generate a photorealistic portrait following the character sheet exactly.
Maintain all specified features and distinguishing marks.

[User Prompt]
# Character Profile: Sarah Chen
[Full character sheet from above]

Current scene: Professional headshot, studio setting, soft lighting

[Think Block]
Subject is Sarah Chen as defined in the character sheet. Key features
to ensure: high cheekbones, beauty mark below right eye, gold hoop
earrings, crescent moon pendant. Setting is studio, keep background
simple grey gradient. Lighting should be soft and flattering.

[Assistant Response]
Here's Sarah Chen's professional headshot as specified.

Step 3: Make Modifications (Second Turn)

Now request a specific change while referencing the established character:

[User Prompt]
Change the setting to a Tokyo street at night. Keep Sarah exactly
the same - her face, hair, earrings, necklace - but put her in
a casual outfit: black turtleneck and jeans.

[Think Block]
Preserve: Face shape, skin tone, freckles, eye shape, beauty mark,
earrings, necklace. Change: Setting to Tokyo night street, clothing
to casual black turtleneck and jeans. Add environmental lighting
from neon signs while maintaining face recognizability.

[Assistant Response]
Here's Sarah on a Tokyo street at night in casual wear.

ComfyUI Implementation

Required Nodes

ZImageTextEncoder

Main node for first turn
Inputs: system_prompt, user_prompt, thinking_content, assistant_content
Outputs: conditioning, formatted_prompt, conversation

ZImageTurnBuilder

Adds subsequent turns to conversation
Inputs: previous (conversation), user_prompt, thinking_content
Outputs: conditioning (when clip connected), conversation

Basic Workflow

CLIPLoader (Lumina 2)
       ↓
ZImageTextEncoder ──────────────────────→ KSampler
├── system_prompt: "Generate photorealistic..."
├── user_prompt: [Character sheet]
├── thinking_content: [Reasoning about features]
└── assistant_content: [Brief response]

Multi-Turn Workflow

ZImageTextEncoder (Turn 1: Character Definition)
       ↓ conversation
ZImageTurnBuilder (Turn 2: First Modification)
       ↓ conversation
ZImageTurnBuilder (Turn 3: Second Modification)
       ↓ conditioning
KSampler

Settings

Parameter	Recommended Value
add_think_block	true
is_final (TurnBuilder)	true (on last turn)
template_preset	photorealistic or custom

Complete Example: Walter Finch

Let's walk through creating a consistent character across multiple images.

Turn 1: Define Walter

System Prompt:

Generate an image in classic American comic book style.
Bold outlines, flat colors with halftone shading.

User Prompt:

# Character Profile: Walter Finch (Wally)

## Core Identity
- Name: Walter Finch (Nickname: Wally)
- Gender: Male
- Age: 72
- Ethnicity: Caucasian (British descent)
- Skin Tone: Fair, warm sun-kissed glow on cheeks and nose

## Head & Face
- Face Shape: Oval, prominent jawline, receding hairline
- Eye Color: Ice-blue with subtle gold flecks around the iris
- Hair: Pure white, side-parted, full beard and mustache
- Glasses: Thin, gold-rimmed reading glasses
- Expression: Warm, gentle smile with a twinkle in his eye

## Attire
- Light gray and blue checkered button-down shirt
- Dark grey wool trousers
- Brown leather loafers

## Props
- Pale lavender ceramic mug of coffee
- Gold pocket watch chain visible in breast pocket

Think Block:

Creating Walter "Wally" Finch: 72 years old, British descent,
warm grandfather type. Key features: ice-blue eyes with gold
flecks (distinctive), gold-rimmed glasses, full white beard
side-parted hair. Must include: lavender coffee mug and gold
pocket watch chain. Style: American comic book with halftone.

Assistant Response:

Here's Walter Finch, the kindly British gentleman.

Turn 2: Modify Beard Color

User Prompt:

Let's change his beard to have some red mixed in with the white.
Keep everything else exactly the same.

Think Block:

Modifying beard only: change from pure white to red and white
mixed. Preserve: ice-blue eyes with gold flecks, gold-rimmed
glasses, facial structure, lavender mug, pocket watch chain,
checkered shirt, warm expression.

Turn 3: Add New Element

User Prompt:

Let's put a cute baby flying sloth hovering above his head too.

Think Block:

Adding element: baby flying sloth above Walter's head. Preserve
all of Walter's features including the red-white beard from
previous turn. The sloth should be small and cute, floating
or hovering position.

Result

The final image contains Walter with all his defined features, the red-white beard modification from turn 2, and the flying sloth addition from turn 3.

Why Using Qwen3 for Prompt Generation Helps

Z-Image's encoder is Qwen3-4B. All Qwen3 models share the same tokenizer.

When you use a Qwen3 model to generate your character descriptions:

Same vocabulary means same token IDs
Semantic nuances transfer directly
Think block reasoning primes the encoder

For best results, consider using Qwen3-72B or larger to generate detailed character sheets, then feed them directly to Z-Image.

Example Qwen3 System Prompt

You are a visual prompt engineer for Z-Image Turbo.

Generate detailed, visually-specific character descriptions.
Focus on concrete visual details - colors, textures, specific
features. Avoid abstract concepts.

Structure your output as:
1. <think> block with visual planning
2. Hierarchical character profile with sections

The output will be used directly as a Z-Image prompt.

Tips for Better Consistency

Be Exhaustively Specific

Don't leave anything to chance. If you want specific eye color, say exactly what it is. "Blue eyes" is vague. "Ice-blue with subtle gold flecks around the iris" is specific.

Use the Think Block

The think block lets you explicitly state what to preserve:

<think>
Changing: outfit to summer dress
Preserving: face shape, eye color (hazel with amber ring),
beauty mark on left cheek, ear piercings (two in left ear),
nose shape, lip fullness
</think>

One Change Per Turn

Don't overload modifications. Make one targeted change per turn:

Good:

Turn 2: Change outfit
Turn 3: Change background
Turn 4: Add accessory

Risky:

Turn 2: Change outfit AND background AND add accessory AND alter lighting

Reference Previous Content

In later turns, briefly reference what should stay:

Keep Sarah's face, hair, and jewelry exactly as before.
Only change her outfit to a red evening gown.

Consistent Style Markers

Keep style keywords consistent across all turns:

[Every turn ends with]
photorealistic, 8k, professional photography, shot on Canon EOS R5

Limitations & Expectations

What Works Well

Preserving distinctive features (scars, beauty marks, eye color)
Maintaining clothing style across scenes
Keeping accessories consistent
Changing backgrounds while preserving subject

What's Challenging

Exact face reproduction (this isn't face-swap)
Perfect consistency across wildly different poses
Maintaining consistency across different art styles

Realistic Expectations

Multi-turn conversation improves consistency but doesn't guarantee perfection. Expect:

80-90% feature preservation with good prompting
Occasional need for regeneration
Better results with distinctive characters

Troubleshooting

Features Drifting

Problem: Character looks slightly different each generation.

Solution: Add more specific distinguishing features. Instead of "brown hair," use "chocolate brown hair with copper highlights, falling to mid-back, slight wave, side-parted."

Modifications Not Applying

Problem: Requested changes don't appear.

Solution: Be explicit in think block about what changes. State the change first, then list what stays the same.

Style Inconsistency

Problem: Art style changes between turns.

Solution: Include style keywords in system prompt and repeat them in each turn's assistant response.

Template Files

Z-Image includes 140+ templates in ComfyUI. For character work, try:

Template	Best For
photorealistic	Realistic characters
character_design	Reference sheets
comic_american	Comic book style
anime_ghibli	Ghibli-style characters
portrait_studio	Studio portraits

Access via template_preset in ZImageTextEncoder.

Get Started

Download ComfyUI nodes: Comfy-Org/z_image_turbo
Create a character sheet using the template above
Start with Turn 1 - define everything
Iterate with Turn 2+ - make targeted changes

Or practice basic prompting at z-image.vip.

References

Explore Z-Image at z-image.vip — free, unlimited.

Keep Reading

Z-Image Prompt Engineering Masterclass — Visual vocabulary and formulas
ComfyUI Custom Nodes Guide — Full node reference
What is Z-Image Turbo? — Complete beginner's guide