opncrafter

Generative Art: Mastering ComfyUI

Dec 29, 2025 • 22 min read

Automatic1111 is the friendly UI for Stable Diffusion beginners. ComfyUI is the power user's tool — a node-based visual programming environment where every step in the image generation pipeline is an explicit, wirable node. It looks like Blender's shader editor and gives you the same kind of compositional power: build complex multi-model, multi-pass workflows visually, save them as JSON, and run them as an API in production.

1. Installation

# Prerequisites: Python 3.11+, NVIDIA GPU with 8GB+ VRAM (or CPU, slowly)
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI

pip install -r requirements.txt

# Download a base model (SDXL recommended for quality)
# Place in ComfyUI/models/checkpoints/
# Models available at: https://civitai.com, huggingface.co

# Launch (GPU)
python main.py --listen 0.0.0.0 --port 8188

# Launch (CPU — slow but works)
python main.py --cpu

# Access at: http://localhost:8188

2. The Node Graph Concepts

Image generation in ComfyUI is a directed acyclic graph (DAG). Each node has inputs and outputs — you wire them together to build a pipeline. The minimal txt2img workflow uses 6 nodes:

NodePurposeKey Setting
Load CheckpointLoad the base model (SDXL, Pony, etc.)ckpt_name: your model file
CLIP Text EncodeTokenize positive prompt to embeddingstext: "a photo of..."
CLIP Text Encode (neg)Tokenize negative prompt (what to avoid)text: "blurry, ugly..."
Empty Latent ImageDefine output canvas size in latent spacewidth, height, batch_size
KSamplerThe denoising loop — creates the imagesteps: 30, cfg: 7.5, sampler: euler_a
VAE DecodeConvert latent tensor to pixelsConnects KSampler LATENT → image
Save ImageWrite final image to diskoutput_path, filename_prefix

3. Advanced Workflow: Hires Fix (2-Pass Generation)

Generate at lower resolution for speed, then upscale and re-denoise for quality. The standard approach for getting sharp 1024×1024+ images:

# Hires Fix workflow (as ComfyUI API JSON — simplified pseudocode)
Pass 1: Txt2Img at 512×512 (fast, low VRAM)
  ↓ Latent output
Upscale Latent by 2x (2×2 = 1024×1024 in latent space)
  ↓ Upscaled latent
Pass 2: img2img KSampler (denoise_strength: 0.5 — refine the upscaled latent)
  ↓ Refined latent
VAE Decode → 1024×1024 crisp output

# Key knob: denoise_strength on Pass 2
# 0.3 = Very faithful to Pass 1 (safe, fast)
# 0.7 = Much creative freedom, adds detail (slower, riskier)
# 0.5 = Standard sweet spot

4. ControlNet: Guided Generation

ControlNet conditions image generation on a reference image — telling the model to follow a pose, depth map, edge map, or sketch:

# Common ControlNet modes:
# openpose: Match human poses from a reference photo
# depth: Match depth structure (foreground/background layout)
# canny: Match edge maps (preserve object shapes)
# lineart: Follow a sketch

# ComfyUI ControlNet workflow:
[Load ControlNet Model (openpose)]
[AuxPreprocessor (OpenposePreprocessor)]  ← Extracts pose from reference image
    ↓ Pose map image
[Apply ControlNet] ← Takes (conditioning, controlnet model, image, strength)
    ↓ Modified conditioning
[KSampler] ← Uses the ControlNet-guided conditioning

# Strength 0.0-1.0: 0.7 is a common starting point
# Guidance end: limit ControlNet influence to first X% of steps (e.g., 0.7)

5. LoRA Stacking

LoRAs are small fine-tune adapters that add a style, character, or concept. Stack multiple LoRAs in a single workflow:

[Load Checkpoint: sdxl_base.safetensors]
    ↓
[Load LoRA: anime-style-v2.safetensors | strength_model: 0.8 | strength_clip: 0.8]
    ↓
[Load LoRA: studio-ghibli.safetensors | strength_model: 0.4 | strength_clip: 0.4]
    ↓
[Load LoRA: detail-enhancer.safetensors | strength_model: 0.6 | strength_clip: 0.6]
    ↓
[CLIP Text Encode] → [KSampler]

# Tips:
# Keep total LoRA influence below 1.5-2.0 to avoid artifacts
# Use lower strength for style LoRAs (0.3-0.6), higher for character/concept (0.6-0.9)
# Test LoRAs individually before stacking

6. ComfyUI as an API Backend

ComfyUI exposes a REST API. You can programmatically submit workflows and poll for results — making it possible to integrate generative image features into your AI app:

import json
import requests
import websocket
import uuid

COMFY_URL = "http://localhost:8188"

def queue_workflow(workflow_json: dict) -> str:
    """Submit a workflow to ComfyUI and return the prompt_id"""
    client_id = str(uuid.uuid4())
    response = requests.post(
        f"{COMFY_URL}/prompt",
        json={"prompt": workflow_json, "client_id": client_id}
    )
    return response.json()["prompt_id"]

def get_image(prompt_id: str) -> bytes:
    """Poll until done, return image bytes"""
    ws = websocket.WebSocket()
    ws.connect(f"ws://localhost:8188/ws?clientId={prompt_id}")
    
    while True:
        msg = json.loads(ws.recv())
        if msg["type"] == "executed":
            filename = msg["data"]["output"]["images"][0]["filename"]
            image_data = requests.get(
                f"{COMFY_URL}/view?filename={filename}&type=output"
            ).content
            ws.close()
            return image_data

# Modify workflow JSON nodes to parameterize prompts
workflow = json.load(open("my_workflow.json"))
workflow["6"]["inputs"]["text"] = "A cyberpunk cityscape at night, neon lights"
workflow["25"]["inputs"]["seed"] = 42  # Reproducible generation

prompt_id = queue_workflow(workflow)
image_bytes = get_image(prompt_id)
with open("output.png", "wb") as f:
    f.write(image_bytes)

7. The JSON Workflow Trick

Every ComfyUI-generated image embeds its workflow JSON in metadata. Drag any ComfyUI PNG back into the browser and the full workflow that created it loads automatically. This makes sharing, remixing, and versioning workflows trivial — the image IS the reproducible recipe.

Frequently Asked Questions

ComfyUI vs Automatic1111 — which should I use?

A1111 for quick experimentation with a traditional UI. ComfyUI when you need: full control over the pipeline, programmatic API access for production use, multi-model workflows, or reproducible deterministic generation. Once you understand the nodes, ComfyUI is faster and more powerful.

How much VRAM do I need?

SD 1.5: 4 GB minimum (6 GB comfortable). SDXL 1.0: 8 GB minimum (12 GB comfortable). FLUX: 16-24 GB. You can reduce VRAM usage significantly with: --fp8_e4m3fn (8-bit precision), --lowvram flag, or the taesd VAE for faster, lower-memory decoding.

Conclusion

ComfyUI turns image generation from a one-shot prompt into a programmable pipeline. The node graph makes complex operations like ControlNet-guided generation, multi-pass hires fix, and LoRA stacking transparent and composable. For developers building production AI apps that need image generation — a storybook app, a product design tool, a character creator — ComfyUI's API backend capacity makes it the right foundation.

Continue Reading

👨‍💻
Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK