AgentSkillsCN

visual-creation

AI 图像与视频生成。适用于:创作艺术作品、图像、插画、动画、视频、视觉素材,进行 AI 艺术创作、风格指导,或在图像中融入文本内容、选择合适的图像或视频模型。

SKILL.md
--- frontmatter
name: visual-creation
description: "AI image and video generation. Use when: creating artwork, images, illustrations, animations, videos, visual assets, AI art generation, style guidance, choosing image or video models, text-in-image."

AI Visual Creation

Decision frameworks for AI image and video generation. Not tutorials — corrections, gotchas, and "which tool for which job."


Midjourney: Version Gotchas

V7 Breaking Changes (Critical)

FeatureV6V7
Multi-prompt :: weighting✅ Works⚠️ CHANGED (different behavior)
Negative weights ::-0.5✅ Works⚠️ Less predictable
--cref (Character Ref)DEPRECATED (use --oref)
--stylize scale0-10000-1000 (different results!)
--no parameter
--iw range0-20-3
--oref (Omni Reference)✅ New (2x GPU cost)
--draft mode✅ New (10x faster, half cost)
--exp parameter✅ New (0-100)

Stylize Scale Migration: V6 --s 100 ≈ V7 --s 300-400 | V6 --s 250 ≈ V7 --s 600-700

V7 workarounds for changed weighting:

  • Word order matters (early = more weight)
  • Use natural language emphasis
  • --no for exclusion
  • Repetition for emphasis

V6 prompt: cyberpunk::2 nature::1 dystopian::-0.5 V7 equivalent: cyberpunk city with nature elements, NOT dystopian --no dystopian, grim, dark


Midjourney: Reference Type Decision

Quick Selector

I want...UseParameterVersion
Composition inspiration + textImage Prompt--iw 1-2All
Same aesthetic, different subject--sref--sw 100-300All
Same character, new pose/outfit--cref--cw 0-50V6 only
Same character, keep everything--cref--cw 100V6 only
Exact object/character preservation--oref--ow 100-400V7 only

⚠️ V7 Migration: --cref deprecated in V7. Use --oref instead (works for characters AND objects).

Reference Type Deep Dive

Image Prompt (--iw)

  • Mental model: Addition (image + text = result)
  • Preserves: Composition, layout
  • Changes: Details, style via text
  • Range: 0-3 (V7), 0-2 (V6)

Style Reference (--sref)

  • Mental model: Multiplication (style × subject = result)
  • Preserves: Color palette, mood, rendering
  • Changes: Subject, composition entirely
  • Range: --sw 0-1000

Character Reference (--cref) — V6 ONLY

  • ⚠️ Deprecated in V7 — use --oref instead
  • CRITICAL: Works best with Midjourney-generated images, NOT real photos
  • --cw 0 = face only (max outfit flexibility)
  • --cw 100 = everything (face, hair, clothing)
  • Cannot preserve: fine freckles, small logos, detailed tattoos

Omni Reference (--oref) — V7 ONLY

  • 2x GPU cost
  • Only ONE reference allowed
  • NOT compatible with inpainting/outpainting or Draft mode
  • Competing params: high --stylize needs higher --ow to balance

Common Failures

ProblemCauseFix
Reference ignored--iw too lowIncrease to 2.0+
Shape lost, got mandalaSymmetry biasAdd "asymmetrical", use --no symmetric, mandala
Character looks differentUsing real photoUse Midjourney-generated source
Style overwhelms shapeHigh --sw, low --iwLower --sw OR increase --iw
--oref not workingV6 or Draft modeSwitch to V7 standard mode

Model Selection: Images

Decision Matrix

NeedBest ChoiceWhyBackup
PhotorealismFlux 2 / Imagen 4Best benchmark qualityMidjourney V7
Artistic/stylizedMidjourney V7Color harmony, mood, abstractLeonardo.ai
Text in imagesIdeogram 3.085-90% accuracy (best)GPT Image 1.5
Character consistencyLeonardo.aiCustom LoRA trainingFlux Kontext
Technical diagramsFlux 2Text + spatial controlRecraft V3
Speed prioritySDXL / SD4 Turbo13 sec/imageIdeogram Turbo
Quality priorityFlux 2 ProBest 2026 benchmarksGPT Image 1.5
Commercial safetyAdobe FireflyLicensed training onlyDALL-E 3
Budget (API)Flux Schnell$0.003/imageSDXL
Open sourceStable Diffusion80% market shareHunyuanImage

Text Rendering Hierarchy

Best → Worst: Ideogram 3.0 (85-90%) >> GPT Image 1.5 >> Recraft V3 >> Flux 2 (~60%) >> Imagen 4 >> DALL-E 3 >> Midjourney V7 (~15% better than V6, still poor)

Rule: If you need readable text, don't use Midjourney. Use Ideogram, GPT Image, or Flux 2.


Model Selection: Video

Decision Matrix

NeedBest ChoiceWhyBackup
Highest qualityRunway Gen-4.5Benchmark leader (1,247 ELO)Veo 3.1
With audio syncKling 2.6Only simultaneous audio-visual
Longest durationKling 2.63 minutes nativeRunway
Character consistencyKling O1Unified multimodalKling 2.6
Professional colorLuma Ray3Only native HDR, 16-bit EXRRunway
BudgetHailuo 2.3Best cost-effectivenessKling 2.3
Free/open sourceHunyuanVideoBeats Gen-3 qualityStable Video

Key Insight

Audio-visual sync is now a competitive differentiator. Only Kling 2.6 generates video + voiceover + sound effects + ambient audio in a single pass.


Troubleshooting Patterns

"It won't preserve the shape"

  1. Use Image Prompt with high --iw (2.0+)
  2. Match aspect ratio (input 1:1 → output --ar 1:1)
  3. Add --style raw for tighter adherence
  4. Lower --stylize (30-50) for more literal interpretation
  5. If still failing: Try Imagen 4 or Flux 2 — they preserve shapes more literally than Midjourney

"It keeps making it symmetric"

Midjourney defaults to symmetry. Fight it:

  1. Add "asymmetrical" keyword explicitly
  2. Use --no symmetric, mandala, radial, mirrored, balanced, centered
  3. Add --chaos 6-10
  4. Use directional language ("positioned to the left", "stepping diagonally")
  5. Material words help ("weathered metal", "carved stone" resist perfect symmetry)

"Style overwhelms subject"

Balance the competing forces:

  • Lower --sw (style weight)
  • Increase --iw (image weight) if using reference
  • Use --style raw
  • Simplify text prompt

"Character keeps changing"

V7 (recommended):

  1. Use --oref with Midjourney-generated source (2x GPU cost)
  2. Start at --ow 100, increase to 200-400 for facial accuracy
  3. For many images: Leonardo.ai with custom LoRA

V6 (legacy):

  1. Use --cref with Midjourney-generated source (not real photos)
  2. --cw 0 for face only, --cw 100 for everything

References

NeedLoad
Midjourney reference types detailmidjourney/reference-types.md
Midjourney V7 full guidemidjourney/v7-guide.md
Midjourney parametersmidjourney/parameters.md
Midjourney animation/videomidjourney/animation.md
Image model comparisonimage-models.md
Video model comparisonvideo-models.md

Sources: All claims cite official documentation (docs.midjourney.com, vendor APIs) and benchmarks (Artificial Analysis, LM Arena). Full URLs in reference files.