AgentSkillsCN

comfyui-workflow

在为ComfyUI图像生成工作流创建、修改或调试时使用此技能。可在以下场景下自动触发:‘ComfyUI工作流’‘ComfyUI流水线’‘txt2img’‘img2img’‘inpainting’‘ControlNet’‘KSampler’‘Flux工作流’‘SDXL工作流’‘SD3.5工作流’‘LoRA’‘IPAdapter’,或任何关于在ComfyUI中构建图像生成流水线的请求。

SKILL.md
--- frontmatter
name: comfyui-workflow
description: "Use this skill when creating, modifying, or debugging ComfyUI image generation workflows. Triggers on: 'ComfyUI workflow', 'ComfyUI pipeline', 'txt2img', 'img2img', 'inpainting', 'ControlNet', 'KSampler', 'Flux workflow', 'SDXL workflow', 'SD3.5 workflow', 'LoRA', 'IPAdapter', or any request to build an image generation pipeline in ComfyUI."
license: MIT
metadata:
  author: marduk191
  version: "2.0.0"

ComfyUI Workflow Guide

Comprehensive guide to building image generation workflows in ComfyUI. Covers SD1.5, SDXL, SD3.5, and Flux model families with all common pipeline patterns.

When to Use This Skill

  • Creating text-to-image, image-to-image, or inpainting workflows
  • Setting up Flux, SDXL, or SD3.5 pipelines
  • Adding ControlNet, LoRA, IPAdapter, or other conditioning
  • Optimizing sampler/scheduler settings
  • Building complex multi-stage generation pipelines

Model Families & Loading

Different models require different loader nodes. This is the most common source of errors.

SD1.5 / SDXL

Uses a single checkpoint file containing UNet + CLIP + VAE.

code
Node: CheckpointLoaderSimple
  ckpt_name: "sd_xl_base_1.0.safetensors"
Outputs: MODEL, CLIP, VAE

File location: ComfyUI/models/checkpoints/

SD3.5

Uses a checkpoint with triple CLIP (clip_g + clip_l + t5xxl).

code
Node: CheckpointLoaderSimple
  ckpt_name: "sd3.5_large.safetensors"
Outputs: MODEL, CLIP, VAE

Or load components separately:

code
Node: UNETLoader
  unet_name: "sd3.5_large.safetensors"
  weight_dtype: "default"
Outputs: MODEL

Node: TripleCLIPLoader
  clip_name1: "clip_g.safetensors"
  clip_name2: "clip_l.safetensors"
  clip_name3: "t5xxl_fp16.safetensors"
Outputs: CLIP

Node: VAELoader
  vae_name: "sd3.5_vae.safetensors"
Outputs: VAE

SD3.5 Variants:

ModelParamsSpeedQuality
SD3.5 Large8BSlowBest
SD3.5 Large Turbo8B distilledFastGood
SD3.5 Medium2.5BMediumGood

Flux

Flux does NOT use a single checkpoint. Load components separately:

code
Node: UNETLoader (or "Load Diffusion Model")
  unet_name: "flux1-dev.safetensors"
  weight_dtype: "default"
Outputs: MODEL

Node: DualCLIPLoader
  clip_name1: "t5xxl_fp16.safetensors"    (or fp8)
  clip_name2: "clip_l.safetensors"
  type: "flux"
Outputs: CLIP

Node: VAELoader
  vae_name: "ae.safetensors"
Outputs: VAE

Exception: FP8 checkpoint versions can use CheckpointLoaderSimple with CFG=1.0.

Flux Model Variants:

ModelStepsQualityLicense
Flux.1 Dev20-50BestNon-commercial
Flux.1 Schnell4GoodApache 2.0
Flux.1 ProN/ABestAPI only

Flux File Locations:

  • Diffusion models: ComfyUI/models/diffusion_models/
  • Text encoders: ComfyUI/models/text_encoders/ (or models/clip/)
  • VAE: ComfyUI/models/vae/

Important: Flux does NOT use negative prompts. Leave negative conditioning empty or don't connect it.

Flux Kontext (Image Editing)

Flux Kontext supports context-based image editing — understanding both image and text content.

code
Node: UNETLoader
  unet_name: "flux1-kontext-dev.safetensors"

Node: FluxKontextImageEncode
  image: (input image)
  clip: (from DualCLIPLoader)
Outputs: CONDITIONING

Core Workflow Patterns

Text-to-Image (txt2img)

The most basic workflow:

code
CheckpointLoaderSimple ──→ MODEL ──→ KSampler ──→ LATENT ──→ VAEDecode ──→ IMAGE ──→ SaveImage
                      ├──→ CLIP ──→ CLIPTextEncode (positive) ──→ CONDITIONING ──→ KSampler
                      ├──→ CLIP ──→ CLIPTextEncode (negative) ──→ CONDITIONING ──→ KSampler
                      └──→ VAE ──→ VAEDecode
EmptyLatentImage ──→ LATENT ──→ KSampler

Node connections:

  1. CheckpointLoaderSimple → loads MODEL, CLIP, VAE
  2. CLIPTextEncode (positive) → connect CLIP, enter prompt
  3. CLIPTextEncode (negative) → connect CLIP, enter negative prompt
  4. EmptyLatentImage → set width, height, batch_size
  5. KSampler → connect MODEL, positive, negative, latent_image
  6. VAEDecode → connect samples (from KSampler), vae
  7. SaveImage → connect images

Image-to-Image (img2img)

Same as txt2img but encode an input image instead of using EmptyLatentImage:

code
LoadImage ──→ IMAGE ──→ VAEEncode ──→ LATENT ──→ KSampler (denoise < 1.0)

Key difference: Set KSampler denoise to 0.3-0.8 (lower = closer to original).

Inpainting

code
LoadImage (image) ──→ VAEEncodeForInpaint ──→ LATENT ──→ KSampler
LoadImage (mask)  ──→ VAEEncodeForInpaint

Or use Flux Fill for native inpainting:

code
Node: UNETLoader
  unet_name: "flux1-fill-dev.safetensors"

Node: InpaintModelConditioning
  positive, negative, vae, image, mask
Outputs: positive, negative, latent

Upscaling (Hires Fix)

Two-pass generation for high resolution:

code
Pass 1: txt2img at 512x512 (SD1.5) or 1024x1024 (SDXL/Flux)
Pass 2: Upscale latent → KSampler at low denoise (0.3-0.5)
code
KSampler (pass1) → LatentUpscale → KSampler (pass2, denoise=0.4) → VAEDecode

Or use a model upscaler:

code
VAEDecode → ImageUpscaleWithModel → VAEEncode → KSampler (denoise=0.3) → VAEDecode

Sampler & Scheduler Reference

KSampler Parameters

ParameterDescriptionTypical Values
seedRandom seedAny integer, -1 for random
stepsDenoising steps20-50 (model dependent)
cfgClassifier-free guidance1.0-15.0 (model dependent)
sampler_nameSampling algorithmSee table below
schedulerNoise scheduleSee table below
denoiseDenoising strength1.0 (txt2img), 0.3-0.8 (img2img)

Samplers

SamplerSpeedQualityBest For
eulerFastGoodQuick previews
euler_ancestralFastGoodMore variation
dpmpp_2mMediumGreatGeneral use
dpmpp_2m_sdeSlowExcellentHigh quality
dpmpp_3m_sdeSlowExcellentComplex scenes
dpmpp_sdeMediumGreatGood balance
ddimFastGoodConsistent results
uni_pcFastGoodSpeed priority
heunSlowExcellentMaximum quality
dpm_2MediumGoodSD1.5
lmsFastDecentLegacy

Schedulers

SchedulerDescriptionBest For
normalLinear scheduleDefault, safe choice
karrasKarras noise scheduleBetter detail, most popular
exponentialExponential decaySmooth transitions
sgm_uniformUniform sigma spacingSD3.5, some Flux
simpleSimple linearFlux Schnell
ddim_uniformDDIM spacingDDIM sampler
betaBeta scheduleExperimental

Recommended Settings by Model

ModelStepsCFGSamplerScheduler
SD 1.520-307-9dpmpp_2mkarras
SDXL25-405-8dpmpp_2mkarras
SDXL Turbo4-81-2euler_ancestralnormal
SD3.5 Large28-504-7dpmpp_2msgm_uniform
SD3.5 Medium28-404-5eulersimple
Flux Dev20-501.0eulersimple
Flux Schnell41.0eulersimple

Note: Flux uses guidance built into the model — set CFG to 1.0 and don't use negative prompts.


Resolution Guide

ModelOptimal BaseAspect Ratios
SD 1.5512x512512x768, 768x512
SDXL1024x10241024x1024, 896x1152, 1152x896, 768x1344, 1344x768
SD3.51024x1024Same as SDXL
Flux1024x10241024x1024, 768x1360, 1360x768, 832x1216, 1216x832

Always use multiples of 64 for dimensions. SDXL and Flux perform best at ~1 megapixel total.


LoRA (Low-Rank Adaptation)

Basic LoRA Loading

code
Node: LoraLoader
  model: (from checkpoint)
  clip: (from checkpoint)
  lora_name: "my_lora.safetensors"
  strength_model: 0.8
  strength_clip: 0.8
Outputs: MODEL, CLIP

Chain between CheckpointLoader and KSampler:

code
CheckpointLoaderSimple → LoraLoader → MODEL/CLIP → KSampler

Stacking Multiple LoRAs

Chain LoraLoader nodes:

code
CheckpointLoader → LoraLoader (style) → LoraLoader (character) → LoraLoader (detail) → KSampler

LoRA Tips

  • strength_model controls image effect (0.5-1.0 typical)
  • strength_clip controls text understanding
  • Start at 0.7-0.8 and adjust
  • Too high causes artifacts; too low has no effect
  • File location: ComfyUI/models/loras/

ControlNet

Basic ControlNet

code
Node: ControlNetLoader
  control_net_name: "control_v11p_sd15_canny.safetensors"
Outputs: CONTROL_NET

Node: ControlNetApplyAdvanced
  positive: (from CLIPTextEncode)
  negative: (from CLIPTextEncode)
  control_net: (from ControlNetLoader)
  image: (preprocessed control image)
  strength: 0.8
  start_percent: 0.0
  end_percent: 1.0
Outputs: positive, negative

Common ControlNet Types

TypeInputUse Case
CannyEdge detection imagePrecise edges, line art
DepthDepth map3D structure, perspective
OpenPosePose skeletonCharacter poses
ScribbleHand-drawn linesRough composition
TileOriginal imageUpscaling, detail
IP2POriginal imageImage editing
SoftedgeSoft edge detectionSmooth contours
LineartLine art extractionAnime/illustration
NormalNormal mapSurface detail
SegmentationSemantic mapRegional control
InpaintImage + maskFill regions

ControlNet Preprocessors

Most control images need preprocessing:

code
LoadImage → CannyEdgePreprocessor → ControlNetApplyAdvanced
LoadImage → DepthAnythingV2Preprocessor → ControlNetApplyAdvanced
LoadImage → DWPreprocessor (OpenPose) → ControlNetApplyAdvanced

Flux ControlNet

Flux has its own ControlNet models (Canny, Depth). They load as LoRAs or full models:

code
Node: ControlNetLoader
  control_net_name: "flux-canny-controlnet-v3.safetensors"

Or as ControlNet LoRAs:

code
Node: ControlNetLoader
  control_net_name: "flux-dev-canny-lora.safetensors"

IPAdapter (Image Prompt Adapter)

Use images as style/composition references:

code
Node: IPAdapterModelLoader
  ipadapter_file: "ip-adapter-plus_sdxl_vit-h.safetensors"

Node: CLIPVisionLoader
  clip_name: "CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors"

Node: IPAdapterAdvanced
  model: (from checkpoint)
  ipadapter: (from IPAdapterModelLoader)
  clip_vision: (from CLIPVisionLoader)
  image: (reference image)
  weight: 0.8
  weight_type: "linear"
  start_at: 0.0
  end_at: 1.0
Outputs: MODEL

Advanced Conditioning

Regional Prompting

code
Node: ConditioningSetArea
  conditioning: (from CLIPTextEncode)
  x: 0
  y: 0
  width: 512    (in pixels/8 for latent)
  height: 1024
  strength: 1.0

Node: ConditioningCombine
  conditioning_1: (region 1)
  conditioning_2: (region 2)
Outputs: CONDITIONING → KSampler positive

CLIP Text Encode with Weights

Use parentheses for emphasis in prompts:

  • (word) — 1.1x weight
  • ((word)) — 1.21x weight
  • (word:1.5) — 1.5x weight
  • [word] — 0.9x weight (de-emphasis)

Conditioning Scheduling

Apply different prompts at different steps:

code
Node: ConditioningSetTimestepRange
  conditioning: (prompt A)
  start: 0.0
  end: 0.5

Node: ConditioningSetTimestepRange
  conditioning: (prompt B)
  start: 0.5
  end: 1.0

Node: ConditioningCombine → KSampler

Advanced Sampling (Custom Sampling)

For more control, replace KSampler with the custom sampling nodes:

code
Node: KSamplerSelect
  sampler_name: "dpmpp_2m"
Outputs: SAMPLER

Node: BasicScheduler (or KarrasScheduler, etc.)
  model: MODEL
  scheduler: "karras"
  steps: 30
Outputs: SIGMAS

Node: SamplerCustom
  model: MODEL
  positive: CONDITIONING
  negative: CONDITIONING
  sampler: SAMPLER
  sigmas: SIGMAS
  latent_image: LATENT
Outputs: output (LATENT), denoised_output (LATENT)

Guider Nodes

For advanced guidance strategies:

code
Node: BasicGuider
  model: MODEL
  conditioning: CONDITIONING
Outputs: GUIDER

Node: CFGGuider
  model: MODEL
  positive: CONDITIONING
  negative: CONDITIONING
  cfg: 7.0
Outputs: GUIDER

Node: DualCFGGuider (for SD3.5/Flux)
  model: MODEL
  cond1: CONDITIONING
  cond2: CONDITIONING
  negative: CONDITIONING
  cfg_conds: 7.0
  cfg_negative: 1.5
Outputs: GUIDER

Node: SamplerCustomAdvanced
  noise: (from RandomNoise)
  guider: GUIDER
  sampler: SAMPLER
  sigmas: SIGMAS
  latent_image: LATENT
Outputs: output, denoised_output

Latent Operations

NodePurpose
EmptyLatentImageCreate blank latent (txt2img)
EmptySD3LatentImageBlank latent for SD3/SD3.5
VAEEncodeImage → latent
VAEDecodeLatent → image
VAEEncodeTiledEncode large images without OOM
VAEDecodeTiledDecode large latents without OOM
LatentUpscaleResize latent (nearest/bilinear)
LatentUpscaleByScale latent by factor
LatentCompositePaste one latent onto another
LatentCropCrop a region from latent
LatentBlendBlend two latents
LatentFlipFlip latent horizontally/vertically
LatentRotateRotate latent 90/180/270 degrees
SetLatentNoiseMaskSet inpainting mask on latent
LatentBatchCombine latents into batch

Image Operations

NodePurpose
LoadImageLoad image from disk
SaveImageSave to output/ directory
PreviewImagePreview without saving
ImageScaleResize image
ImageScaleByScale image by factor
ImageInvertInvert colors
ImageBlendBlend two images
ImageCompositeMaskedComposite with mask
ImageCropCrop region
ImagePadForOutpaintPad image for outpainting
ImageBatchCombine images into batch
ImageUpscaleWithModelAI upscale (ESRGAN, etc.)
UpscaleModelLoaderLoad upscale model

Mask Operations

NodePurpose
LoadImageMaskLoad mask from image
MaskToImageConvert mask to image
ImageToMaskConvert image channel to mask
InvertMaskInvert mask
MaskCompositeCombine masks (add, subtract, multiply)
CropMaskCrop mask region
FeatherMaskSoften mask edges
GrowMaskExpand mask region
SolidMaskCreate solid mask

Workflow Optimization Tips

Memory Saving

  • Use FP8 models and text encoders when VRAM is limited
  • Use VAEDecodeTiled and VAEEncodeTiled for large images
  • Enable --lowvram or --novram CLI flags for < 8GB VRAM
  • Unload models between stages with FreeU or model management

Speed

  • Flux Schnell at 4 steps for quick iteration
  • Use SDXL Turbo / LCM for fast previews (4-8 steps)
  • Lower resolution for composition, upscale for final
  • dpmpp_2m + karras is the best speed/quality balance

Quality

  • Higher steps don't always help — diminishing returns after 30-50
  • CFG too high causes oversaturation/artifacts
  • Hires fix (two-pass) dramatically improves detail
  • ControlNet for structural consistency
  • Negative prompts matter (except Flux)

Common Negative Prompts (SD1.5/SDXL)

code
worst quality, low quality, blurry, deformed, disfigured, extra limbs,
bad anatomy, bad hands, extra fingers, missing fingers, watermark,
text, signature, jpeg artifacts, low resolution

Complete Workflow Examples

SDXL txt2img

code
CheckpointLoaderSimple (sdxl_base) → MODEL, CLIP, VAE
CLIP → CLIPTextEncode (positive: "a majestic castle on a hill, sunset, detailed")
CLIP → CLIPTextEncode (negative: "worst quality, blurry, deformed")
EmptyLatentImage (1024x1024, batch 1)
KSampler (seed, steps=30, cfg=7, dpmpp_2m, karras, denoise=1.0)
  → connect: model, positive, negative, latent_image
VAEDecode (samples from KSampler, vae)
SaveImage

Flux Dev txt2img

code
UNETLoader (flux1-dev) → MODEL
DualCLIPLoader (t5xxl_fp16, clip_l, type=flux) → CLIP
VAELoader (ae) → VAE
CLIP → CLIPTextEncode (positive: "a majestic castle on a hill at sunset")
EmptyLatentImage (1024x1024, batch 1)
KSampler (seed, steps=30, cfg=1.0, euler, simple, denoise=1.0)
  → connect: model, positive, (empty negative), latent_image
VAEDecode (samples, vae)
SaveImage

SDXL + LoRA + ControlNet

code
CheckpointLoaderSimple (sdxl_base) → MODEL, CLIP, VAE
LoraLoader (style_lora, strength=0.8) → MODEL, CLIP
CLIP → CLIPTextEncode (positive)
CLIP → CLIPTextEncode (negative)
LoadImage (reference) → CannyEdgePreprocessor → control_image
ControlNetLoader (canny_sdxl) → CONTROL_NET
ControlNetApplyAdvanced (positive, negative, control_net, control_image, strength=0.7)
EmptyLatentImage (1024x1024)
KSampler (steps=30, cfg=7, dpmpp_2m, karras)
VAEDecode → SaveImage

Resources