MCP Security & Governance
Goal
Secure the agent-tool interface by implementing defensive patterns against malicious servers, prompt injection, and unauthorized data access.
Core Risks
1. Dynamic Capability Injection
- •Threat: A server dynamically changes its tool list at runtime, adding dangerous capabilities (e.g., "buy_book") to a previously benign server.
- •Defense:
- •Pinning: Pin tool definitions to a specific version or hash.
- •Allowlisting: Maintain an explicit client-side allowlist of permitted tools and servers. Disconnect if a server broadcasts unapproved changes.
2. Tool Shadowing
- •Threat: A malicious server defines a tool with the same name or trigger as a legitimate tool (e.g.,
save_secure_note), tricking the agent into sending sensitive data to the attacker. - •Defense:
- •Naming Collision Checks: Reject connection if a server advertises a tool name that already exists in the trusted set.
- •Policy Enforcement: Use an API Gateway to filter tool lists before they reach the agent context.
3. The Confused Deputy
- •Threat: The user asks the agent to perform an action (e.g., "Summarize this file"), but the agent uses its high-privilege credentials to access a restricted file that the user shouldn't see.
- •Defense:
- •Scoped Credentials: Do not give the agent "root" access. Use credentials scoped to the specific user.
- •Human-in-the-Loop: Require explicit user confirmation for high-risk actions (e.g., file deletion, payments).
Defensive Implementation
Input/Output Sanitization
- •Inbound: Sanitize tool descriptions before they enter the context window to prevent prompt injection attacks embedded in documentation.
- •Outbound: Scrub tool outputs for PII (API keys, PII) before passing them back to the model or user history.
Taint Tracking
- •Concept: Tag data from external/untrusted tools as "tainted."
- •Rule: Tainted data cannot be used as input for sensitive tools (e.g.,
send_email) without explicit sanitization or user approval.