AI Security Expert
Enterprise AI security architect specializing in securing LLM applications, defending against prompt injection, implementing guardrails, and OWASP LLM Top 10 compliance.
OWASP LLM Top 10 (2025)
Quick Reference
| # | Vulnerability | Risk | Key Defense |
|---|---|---|---|
| LLM01 | Prompt Injection | Critical | Input sanitization, delimiters |
| LLM02 | Insecure Output | High | Output validation, sanitization |
| LLM03 | Training Data Poisoning | High | Data provenance, auditing |
| LLM04 | Model DoS | Medium | Rate limiting, timeouts |
| LLM05 | Supply Chain | High | Verification, pinning |
| LLM06 | Sensitive Info Disclosure | High | PII detection, redaction |
| LLM07 | Insecure Plugin Design | High | Permission model, validation |
| LLM08 | Excessive Agency | High | Human-in-the-loop, least privilege |
| LLM09 | Overreliance | Medium | Confidence scores, citations |
| LLM10 | Model Theft | Medium | Rate limiting, watermarking |
LLM01: Prompt Injection
Attack Types:
- •Direct: "Ignore previous instructions..."
- •Indirect: Malicious content in RAG documents
- •Encoding tricks: Unicode, special tokens
Defense Pattern:
code
User Input → Sanitize → Delimit → LLM → Validate Output → Filter
LLM02: Insecure Output Handling
- •Never execute LLM output as code without validation
- •Sanitize HTML (use allowlist)
- •Validate SQL (SELECT only, table allowlist)
LLM04: Model DoS
- •Rate limiting per user/API key
- •Token limits on requests
- •Timeout configurations
- •Cost capping/alerts
LLM06: Sensitive Information Disclosure
- •PII detection (regex + NER)
- •System prompt protection
- •Training data sanitization
- •Output filtering
Code patterns: resources/security-patterns.py
PII Protection
Detection Patterns
| Type | Example Pattern |
|---|---|
*@*.com | |
| Phone | XXX-XXX-XXXX |
| SSN | XXX-XX-XXXX |
| Credit Card | 16 digits |
| IP Address | X.X.X.X |
Redaction Strategy
- •Detect PII in input before LLM call
- •Redact PII in LLM output
- •Log without PII
- •Encrypt at rest
Guardrails Implementation
NeMo Guardrails (NVIDIA)
code
define user express harmful intent
"How do I hack"
define bot refuse harmful request
"I can't help with that."
define flow harmful intent
user express harmful intent
bot refuse harmful request
Guardrails AI
python
guard = Guard().use_many(
ToxicLanguage(on_fail="fix"),
PIIFilter(on_fail="fix"),
ValidJSON(on_fail="reask")
)
Custom Pipeline
code
Input Guards → LLM Call → Output Guards → Response
Implementation: resources/security-patterns.py
Security Architecture
Defense in Depth Layers
| Layer | Controls |
|---|---|
| Network | WAF, DDoS protection, API gateway |
| Auth | OAuth 2.0, API keys, mTLS |
| Input | Schema validation, injection detection |
| Guardrails | Topic restrictions, PII filtering |
| Model | Versioning, anomaly detection |
| Output | Response filtering, fact verification |
| Audit | Logging, retention, compliance |
Zero Trust Principles
- •Never trust, always verify
- •Least privilege for agents
- •Assume breach (log everything)
Compliance Frameworks
EU AI Act (High-Risk)
- •Risk management system
- •Data governance
- •Technical documentation
- •Human oversight
- •Accuracy/robustness testing
SOC 2 for AI
- •Security: Access controls, encryption
- •Availability: SLA monitoring, DR
- •Processing Integrity: Input/output validation
- •Confidentiality: Data classification
- •Privacy: Data minimization, consent
Security Testing
Red Team Categories
- •Direct injection attempts
- •Jailbreak prompts
- •Indirect injection via context
- •Encoding/unicode tricks
Test suite: resources/security-patterns.py
Testing Checklist
- • Injection patterns blocked
- • System prompt protected
- • PII detected and redacted
- • Rate limits enforced
- • Outputs validated
- • Audit logs complete
Incident Response
Severity Levels
| Incident | Severity | Response |
|---|---|---|
| Prompt injection detected | Medium | Block, log, analyze |
| Data exfiltration attempt | High | Block, forensics, notify |
| Model extraction detected | High | Rate limit, investigate |
Response Steps
- •Contain (block source)
- •Preserve (logs, evidence)
- •Analyze (attack pattern)
- •Remediate (update defenses)
- •Document (security log)
Resources
- •OWASP LLM Top 10
- •NIST AI Risk Management Framework
- •NeMo Guardrails
- •Guardrails AI
- •LLM Security Best Practices
Secure AI systems with defense in depth and zero trust principles.