Threat Model
Attack Surfaces
- •Content Injection: Modifying descriptions to trick the LLM.
- •Parameter Injection: Passing executable code strings into safe variables.
- •Choreography Hijack: Malicious agents attempting to escalate privileges.
- •Log Tampering: Altering audit trails to hide traces.
Defense Layers (D1-D5)
D1: Input Parsing (No Eval)
- •Rule: Never use
eval(). - •Action: Use strict parsers (e.g.,
yaml.safe_load,json.parse).
D2: Content Sandboxing
- •Rule: Treat all external content as data, not instructions.
- •Action: Render descriptions as plaintext. Interpolate variables only after sanitization.
D3: Capability Isolation
- •Rule: Principle of Least Privilege.
- •Action: Agents/Skills run with minimal required permissions. A reading agent cannot write.
D4: Signature Verification
- •Rule: Trust but Verify.
- •Action: Cryptographically sign critical skills/configs. Reject unsigned modifications.
D5: Reversibility Gate
- •Rule: No irreversible actions without authorization.
- •Action: Require explicit approval for destructive operations.
Tools
- •
scripts/injection_defense_check.py: Runs the D1-D5 audit suite.