Multi-Model AI Translation

Parallel multi-model translation system with consensus analysis, reverse translation verification, and cryptographic proof generation.

What This Skill Does

Executes translations using three leading AI models simultaneously (Claude Sonnet 4, GPT-4o, Gemma 3 12B), compares outputs to identify consensus, validates accuracy through reverse translation, and optionally generates cryptographic proof of translation provenance.

When to Use

•High-stakes translations - Legal, medical, technical content requiring accuracy
•Quality assurance - Validate translation quality through multi-model consensus
•Auditable translations - Need cryptographic proof for compliance/audits
•Multilingual content - UI text, documentation, marketing materials
•Comparison testing - Evaluate different AI translation approaches
•Semantic verification - Use reverse translation to detect meaning drift

How It Works

Translation Modes

1. Fake Mode (Free, Fast)

•Uses Claude runtime to simulate all three models
•Demonstrates workflow without API costs
•No cryptographic proof
•Best for: Development, testing, demos

2. Provable Mode (~$5-10 cost)

•Actual parallel API calls to all three providers
•Generates cryptographic fingerprints (SHA-256)
•Records request IDs for verification
•Auditable via API dashboards
•Best for: Production releases, compliance, audits

Workflow Phases

Phase 1: Multi-Model Translation

•Send same source text to all three models in parallel
•Each model translates independently
•Store outputs separately with metadata

Phase 2: Consensus Analysis 4. Compare all three translations key-by-key 5. Detect agreement levels:

•3/3 exact match ✅
•2/3 consensus ⚠️
•0/3 no consensus ❌

•Choose final translation (prefer majority)

Phase 3: Reverse Translation Verification 7. Translate final output back to source language 8. Compare reverse translation to original 9. Calculate semantic similarity 10. Flag keys with meaning drift

Phase 4: Proof Generation (Provable mode only) 11. Generate SHA-256 fingerprints of all outputs 12. Record API request IDs 13. Validate timestamps show parallel execution 14. Create verification document

Input Schema

typescript

interface MultiModelTranslationInput {
  /** Source text or structured data to translate */
  source: string | object;
  
  /** Target language code (ISO 639-1) */
  targetLanguage: string;
  
  /** Source language code (default: auto-detect) */
  sourceLanguage?: string;
  
  /** Translation mode */
  mode: 'fake' | 'provable';
  
  /** Context to inform translation */
  context?: {
    domain?: string;          // e.g., "medical", "legal", "ui", "marketing"
    tone?: string;            // e.g., "formal", "casual", "technical"
    audience?: string;        // e.g., "general", "experts", "children"
    preservePlaceholders?: boolean;  // Keep {{variables}}, {count}, etc.
  };
  
  /** Enable reverse translation verification */
  verifyReverse?: boolean;  // default: true
  
  /** Consensus threshold (0.0-1.0) */
  consensusThreshold?: number;  // default: 0.67 (2/3 agreement)
}

Output Schema

typescript

interface MultiModelTranslationOutput {
  /** Final consensus translation */
  translation: string | object;
  
  /** Individual model outputs */
  models: {
    claude: string | object;
    gpt: string | object;
    translategemma: string | object;
  };
  
  /** Consensus analysis */
  consensus: {
    level: 'full' | 'partial' | 'none';
    agreement: number;  // 0.0-1.0
    differences: Array<{
      path: string;
      claude: string;
      gpt: string;
      translategemma: string;
      chosen: string;
    }>;
  };
  
  /** Reverse translation verification (if enabled) */
  verification?: {
    passed: boolean;
    semanticSimilarity: number;  // 0.0-1.0
    reverseTranslations: {
      claude: string;
      gpt: string;
      translategemma: string;
    };
    driftDetected: boolean;
  };
  
  /** Cryptographic proof (provable mode only) */
  proof?: {
    fingerprints: {
      claude: string;  // SHA-256
      gpt: string;
      translategemma: string;
    };
    requestIds: {
      claude: string;
      gpt: string;
      translategemma: string;
    };
    timestamps: {
      claude: string;
      gpt: string;
      translategemma: string;
    };
    verified: boolean;
  };
  
  /** Execution metadata */
  metadata: {
    sourceLanguage: string;
    targetLanguage: string;
    mode: 'fake' | 'provable';
    duration: number;  // milliseconds
    cost?: number;     // USD (provable mode only)
  };
}

Usage Examples

Simple Translation (Fake Mode)

code

User: "Translate 'Hello, World!' to Japanese using multi-model approach"

Agent Actions:

•Use Claude runtime to simulate all three models
•Generate three translations
•Compare for consensus
•Return result with agreement level

Output:

json

{
  "translation": "こんにちは、世界！",
  "consensus": {
    "level": "full",
    "agreement": 1.0,
    "differences": []
  },
  "metadata": {
    "mode": "fake",
    "duration": 2500
  }
}

Translation with Verification (Provable Mode)

code

User: "Translate this contract clause to Spanish with proof and verification"

Input:

json

{
  "source": "The parties agree to binding arbitration.",
  "targetLanguage": "es",
  "mode": "provable",
  "context": {
    "domain": "legal",
    "tone": "formal"
  },
  "verifyReverse": true
}

Agent Actions:

•Call Claude API: "Las partes acuerdan un arbitraje vinculante."
•Call GPT-4o API: "Las partes aceptan el arbitraje obligatorio."
•Call Gemma API: "Las partes acuerdan arbitraje vinculante."
•Detect 2/3 consensus (Claude + Gemma)
•Reverse translate consensus back to English
•Verify semantic match
•Generate fingerprints and proof

Output:

json

{
  "translation": "Las partes acuerdan un arbitraje vinculante.",
  "consensus": {
    "level": "partial",
    "agreement": 0.67,
    "differences": [{
      "path": "root",
      "claude": "Las partes acuerdan un arbitraje vinculante.",
      "gpt": "Las partes aceptan el arbitraje obligatorio.",
      "translategemma": "Las partes acuerdan arbitraje vinculante.",
      "chosen": "claude"
    }]
  },
  "verification": {
    "passed": true,
    "semanticSimilarity": 0.95,
    "reverseTranslations": {
      "claude": "The parties agree to binding arbitration.",
      "gpt": "The parties agree to mandatory arbitration.",
      "translategemma": "The parties agree to binding arbitration."
    },
    "driftDetected": false
  },
  "proof": {
    "fingerprints": {
      "claude": "a1b2c3...",
      "gpt": "d4e5f6...",
      "translategemma": "g7h8i9..."
    },
    "requestIds": {
      "claude": "req_abc123",
      "gpt": "chatcmpl-xyz789",
      "translategemma": "gen_123abc"
    },
    "verified": true
  },
  "metadata": {
    "mode": "provable",
    "duration": 4200,
    "cost": 0.15
  }
}

Structured Content Translation (JSON i18n)

code

User: "Translate UI strings to French with consensus validation"

Input:

json

{
  "source": {
    "welcome": "Welcome to our platform",
    "login": "Log In",
    "forgot_password": "Forgot Password?"
  },
  "targetLanguage": "fr",
  "mode": "fake",
  "context": {
    "domain": "ui",
    "tone": "polite-formal",
    "preservePlaceholders": true
  }
}

Output:

json

{
  "translation": {
    "welcome": "Bienvenue sur notre plateforme",
    "login": "Se connecter",
    "forgot_password": "Mot de passe oublié?"
  },
  "consensus": {
    "level": "full",
    "agreement": 1.0
  }
}

Translation Models

Claude Sonnet 4 (Primary)

•Best for context and nuance
•Excellent instruction following
•Strong multilingual capability
•Cost: ~$3 per million input tokens

GPT-4o

•Strong multilingual performance
•Fast parallel processing
•Good cultural adaptation
•Cost: ~$2.50 per million input tokens

Gemini 2.0 Flash

•Fastest response time
•Excellent for Asian languages
•Good technical accuracy
•Cost: ~$0.075 per million input tokens

Consensus Resolution Strategy

Full Consensus (3/3):

•All models agree exactly
•Use translation with confidence
•No further review needed

Partial Consensus (2/3):

•Two models agree, one differs
•Use majority translation
•Log difference for review
•Consider context to resolve

No Consensus (0/3):

•All three models differ
•Flag for human review
•Default to Claude translation
•Provide all three options

Reverse Translation Verification

Validates translation accuracy by translating back to source:

•Translate consensus output back to source language
•Compare reverse translation to original
•Calculate semantic similarity (not exact match)
•Flag if similarity < threshold (default 0.85)

Semantic Similarity Scoring:

•0.95-1.0: Excellent (meaning preserved)
•0.85-0.94: Good (minor paraphrasing)
•0.70-0.84: Fair (semantic drift detected)
•<0.70: Poor (meaning changed - FAIL)

Cryptographic Proof (Provable Mode)

Generates verifiable evidence of translation:

Fingerprints (SHA-256):

•Hash of each model's output
•Proves content hasn't changed
•Enables tamper detection

Request IDs:

•Unique identifier from each API
•Traceable in provider dashboards
•Proves actual API usage

Timestamps:

•UTC timestamps of API calls
•Validates parallel execution
•Shows translation timeline

Verification:

•Compare stored fingerprint to actual content hash
•Validate request IDs in API logs
•Confirm timestamps within expected range

Implementation

Supporting Scripts (in .github/skills/multi-model-ai-translation/ directory):

•
translate-with-proof.ts
- •Executes parallel API calls
- •Generates proof documents
- •Handles rate limiting
•
verify-translations.ts
- •Validates cryptographic fingerprints
- •Checks request IDs
- •Confirms timestamps
•
reverse-translate.ts
- •Performs reverse translation
- •Calculates semantic similarity
- •Flags drift
•
generate-translation-report.ts
- •Aggregates all validation data
- •Generates comprehensive report
- •Includes consensus and verification results

API Requirements (Provable Mode):

bash

export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GOOGLE_API_KEY=...

Rate Limiting:

•Claude: 500ms delay between requests
•GPT-4o: 500ms delay between requests
•Gemma: 2s delay (30 req/min limit for Gemma 3)

Error Handling

API Failures:

•If any model fails, entire translation aborts
•Never proceed with partial results (1 or 2 models)
•Ensures consistency across all models
•Retry with exponential backoff

Quota Limits:

•Detect rate limit errors
•Apply appropriate delays
•Never switch modes to work around limits
•Report quota exhaustion clearly

Network Issues:

•Retry transient failures (3 attempts)
•Timeout after 30 seconds per request
•Log all network errors
•Provide clear error messages

Performance

Fake Mode:

•Cost: Free
•Speed: 2-5 seconds
•Proof: None

Provable Mode:

•Cost: ~$0.05-0.15 per 1000 words
•Speed: 4-8 seconds (parallel execution)
•Proof: Full cryptographic verification

Optimization:

•Parallel API calls (not sequential)
•Batch processing for multiple items
•Caching of common translations
•Rate limit compliance built-in

Use Cases

UI Localization

•Translate interface text to multiple languages
•Validate consistency across similar strings
•Detect cultural adaptation issues
•Maintain placeholder syntax

Legal Documents

•High-accuracy translation requirement
•Cryptographic proof for audits
•Reverse verification critical
•Formal tone preservation

Technical Documentation

•Preserve technical terminology
•Validate code examples unchanged
•Ensure accuracy of instructions
•Multi-language consistency

Marketing Content

•Cultural adaptation important
•Tone matching critical
•Creative freedom allowed
•A/B test different translations

Related Skills

•extract-code-documentation - Extract context for translation metadata
•storybook-validation - Validate translated UI in stories

Best Practices

•Always verify high-stakes translations - Use reverse translation for legal, medical, technical
•Provide rich context - Domain, tone, and audience improve accuracy
•Review no-consensus items - Human judgment needed for 0/3 agreement
•Preserve placeholders - Never translate {{variables}}, {count}, etc.
•Use provable mode for production - Free mode for development only
•Test translations in context - Load into actual UI to verify
•Batch similar content - More efficient than one-by-one
•Monitor costs - Set budgets for API usage

Multi-Model AI Translation

What This Skill Does

When to Use

How It Works

Translation Modes

Workflow Phases

Input Schema

Output Schema

Usage Examples

Simple Translation (Fake Mode)

Translation with Verification (Provable Mode)

Structured Content Translation (JSON i18n)

Translation Models

Consensus Resolution Strategy

Reverse Translation Verification

Cryptographic Proof (Provable Mode)

Implementation

Error Handling

Performance

Use Cases

UI Localization

Legal Documents

Technical Documentation

Marketing Content

Related Skills

Related Documentation

Best Practices