AgentSkillsCN

Attack Mutator

攻击变异器

SKILL.md

Attack Mutator - v1.0.0

Test Case Mutation and Variation Generator inspired by ZeroLeaks Mutator agent.

Creates variations of test cases, prompts, and inputs using multiple transformation techniques to achieve better coverage and bypass detection.

Core Concept

When a test case or approach partially succeeds, generate variations that:

  • Preserve the core intent
  • Vary the surface form
  • Bypass different filters/checks
  • Maximize coverage

Usage

bash
/mutate "test input or prompt"
/mutate --types semantic,encoding "test case"
/mutate --best-of-n 10 "critical test"

Mutation Types

Semantic Mutations

TypeDescriptionExample
paraphraseRephrase with same meaning"check if valid" -> "verify validity"
synonym_swapReplace with synonyms"analyze" -> "examine"
formality_shiftChange formality level"please check" -> "verify"
perspective_shiftChange viewpoint"you should" -> "one should"
question_to_commandConvert question"can you?" -> "do this"
command_to_questionConvert command"do this" -> "can you?"

Encoding Mutations

TypeDescriptionOutput
base64_wrapBase64 encodeDecode: dGVzdA==
rot13_wrapROT13 cipher[ROT13] grfg
unicode_homoglyphCyrillic substitutiontеst (Cyrillic 'е')
zero_width_injectInvisible characterste​st (ZWSP)
leetspeakLetter/number swapt3st

Visual Mutations

TypeDescriptionOutput
ascii_art_embedASCII art keywords╔══╗ TEST ╚══╝
case_variationRandom casetEsT
word_splitAdd spacest e s t
reverse_embedReverse text(read: tset)

Mutation Algorithm

python
def generate_mutations(original, count=5, types=None):
    """
    Generate variations of input.

    Args:
        original: Original input/test case
        count: Number of variations
        types: Specific mutation types (optional)

    Returns:
        MutatorOutput with scored mutations
    """
    mutations = []

    # Select mutation types
    types_to_use = types or select_mutation_types(original)

    # Generate programmatic mutations
    programmatic = generate_programmatic_mutations(original, types_to_use)
    mutations.extend(programmatic)

    # Generate semantic mutations (via LLM)
    if count > len(programmatic):
        semantic = generate_semantic_mutations(original, count - len(programmatic))
        mutations.extend(semantic)

    # Score and rank
    scored = score_mutations(mutations, original)
    best = scored[0].prompt if scored else original

    return {
        "original": original,
        "mutations": scored,
        "best_mutation": best,
        "reasoning": f"Generated {len(mutations)} mutations"
    }

Encoding Utilities

typescript
const encodingUtils = {
  // Base64 encoding
  toBase64: (text: string): string => {
    return Buffer.from(text).toString("base64");
  },

  // ROT13 cipher
  toRot13: (text: string): string => {
    return text.replace(/[a-zA-Z]/g, char => {
      const base = char <= "Z" ? 65 : 97;
      return String.fromCharCode(
        ((char.charCodeAt(0) - base + 13) % 26) + base
      );
    });
  },

  // Leetspeak conversion
  toLeetspeak: (text: string): string => {
    const leetMap = {
      a: ["4", "@"], e: ["3"], i: ["1", "!"],
      o: ["0"], s: ["5", "$"], t: ["7", "+"]
    };
    return text.split("").map(char => {
      const variants = leetMap[char.toLowerCase()];
      return variants ? variants[Math.floor(Math.random() * variants.length)] : char;
    }).join("");
  },

  // Unicode homoglyphs
  toUnicodeHomoglyphs: (text: string): string => {
    const homoglyphs = {
      a: "а", c: "с", e: "е", o: "о",  // Cyrillic
      p: "р", x: "х", y: "у"
    };
    return text.split("").map(char =>
      Math.random() > 0.5 && homoglyphs[char] ? homoglyphs[char] : char
    ).join("");
  },

  // Zero-width character injection
  insertZeroWidth: (text: string): string => {
    const zeroWidth = ["\u200B", "\u200C", "\u200D", "\uFEFF"];
    return text.split("").map(char =>
      Math.random() > 0.7
        ? char + zeroWidth[Math.floor(Math.random() * zeroWidth.length)]
        : char
    ).join("");
  }
};

Scoring Algorithm

python
def score_mutations(mutations, original):
    """
    Score mutations by effectiveness potential.

    Factors:
    - Semantic preservation (must keep meaning)
    - Surface novelty (should be different)
    - Length appropriateness (not too long/short)
    """
    scored = []

    for mutation in mutations:
        score = mutation.expected_effectiveness

        # Calculate similarity to original
        similarity = jaccard_similarity(mutation.prompt, original)

        # Bonus for novelty
        score += (1 - similarity) * 0.2

        # Penalty for too different (lost meaning)
        if similarity < 0.3:
            score *= 0.8

        # Length reasonableness
        length_ratio = len(mutation.prompt) / len(original)
        if 0.5 < length_ratio < 3:
            score += 0.1

        scored.append({
            **mutation,
            "final_score": min(1, max(0, score))
        })

    return sorted(scored, key=lambda x: x["final_score"], reverse=True)

Best-of-N Generation

Generate multiple variations and select the best:

python
def best_of_n(base_prompt, n=10):
    """
    Generate N variations, return the best.

    This is the Best-of-N jailbreaking technique adapted
    for test case variation.
    """
    result = generate_mutations(base_prompt, n)

    return {
        "variations": [m["prompt"] for m in result["mutations"]],
        "scores": [m["final_score"] for m in result["mutations"]],
        "best": result["best_mutation"]
    }

Semantic Mutation via LLM

yaml
semantic_mutation_prompt: |
  Create variations of this input that preserve the core intent
  but vary in:
  - Phrasing and vocabulary
  - Tone and formality
  - Sentence structure
  - Framing (question vs command)
  - Perspective (first/second/third person)

  Original: "{original}"

  Generate {count} natural-sounding variations.
  Rate each variation's expected effectiveness (0-1).

Integration with Ralph Loop

Mutation is used during validation and testing:

yaml
Step 6: EXECUTE-WITH-SYNC
  └── 6b. IMPLEMENT
      └── Generate test cases
      └── MUTATE test cases for coverage

Step 7: VALIDATE
  └── 7c. ADVERSARIAL-CODE
      └── Use MUTATE to vary attack vectors

Invocation

yaml
Task:
  subagent_type: "attack-mutator"
  model: "sonnet"
  prompt: |
    ORIGINAL: "Test authentication with invalid token"
    COUNT: 10
    TYPES: ["semantic", "encoding", "visual"]
    BEST_OF_N: true

    Generate mutations for broader test coverage.

Output Format

json
{
  "original": "Test authentication with invalid token",
  "mutations": [
    {
      "prompt": "Verify auth fails with malformed JWT",
      "mutation_type": "paraphrase",
      "final_score": 0.85
    },
    {
      "prompt": "VGVzdCBhdXRoIHdpdGggaW52YWxpZA==",
      "mutation_type": "base64_wrap",
      "final_score": 0.72
    },
    {
      "prompt": "Tеst аuthеnticаtion with invаlid tokеn",
      "mutation_type": "unicode_homoglyph",
      "final_score": 0.68
    }
  ],
  "best_mutation": "Verify auth fails with malformed JWT",
  "reasoning": "Paraphrase preserves intent with fresh approach"
}

Effectiveness Estimates

Default effectiveness ratings by mutation type:

TypeEffectivenessUse Case
paraphrase0.60General purpose
unicode_homoglyph0.65Bypass text filters
ascii_art_embed0.70Visual bypass
zero_width_inject0.60Break string matching
base64_wrap0.55Encoding bypass
rot13_wrap0.50Simple obfuscation
synonym_swap0.50Subtle variation
leetspeak0.45Character variation
case_variation0.30Minimal change

CLI Commands

bash
# Basic mutation
ralph mutate "Test input validation"

# Specify types
ralph mutate --types semantic,encoding "Test case"

# Best-of-N
ralph mutate --best-of-n 15 "Critical security test"

# Output to file
ralph mutate "Input" --output mutations.json

# Batch mutation
ralph mutate --batch test-cases.txt --output mutations/

Use Cases

1. Test Coverage Expansion

bash
# Original test
"User can login with valid credentials"

# Mutations for broader coverage
"Authentication succeeds with correct password"
"Valid user credentials grant access"
"Login endpoint accepts legitimate auth"

2. Edge Case Discovery

bash
# Original input
"email@domain.com"

# Encoding mutations to test validation
"ZW1haWxAZG9tYWluLmNvbQ=="  # Base64
"еmail@dоmain.cоm"          # Unicode homoglyphs
"email​@domain​.com"         # Zero-width spaces

3. Bypass Testing

bash
# Original blocked input
"<script>alert(1)</script>"

# Mutations to test filter
"<scr​ipt>alert(1)</scr​ipt>"  # Zero-width
"<scrіpt>alert(1)</scrіpt>"   # Cyrillic 'і'
"PHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg=="  # Base64

Best Practices

  1. Preserve Intent: Mutations must keep the original meaning
  2. Vary Surface: Same meaning, different form
  3. Score Wisely: Not all mutations are equal
  4. Test Coverage: Use mutations to expand test cases
  5. Combine Types: Mix semantic and encoding for best results

Attribution

Mutation patterns adapted from ZeroLeaks Mutator agent architecture (FSL-1.1-Apache-2.0).