Input Validation & Output Encoding
Overview
Input validation and output encoding form a defense-in-depth strategy against injection attacks. Neither is sufficient on its own:
- •Input validation ensures that data conforms to expected formats before processing. It reduces the attack surface but cannot account for every output context.
- •Output encoding ensures that data is rendered safely in its target context (HTML, SQL, JavaScript, etc.). It neutralizes malicious content but should not be the only line of defense.
Always apply both: validate all input as early as possible, and encode all output as late as possible — immediately before it is rendered or interpreted in its target context.
Input Validation Strategy
Allowlist vs. Denylist
- •Always prefer allowlist (positive) validation. Define what is permitted and reject everything else. Allowlists are explicit, predictable, and far harder to bypass.
- •Denylist (negative) validation is fragile. Blocking known-bad patterns always misses novel attack vectors, encoding tricks, and edge cases.
Principles
- •Validate as early as possible — at the API boundary, form handler, or deserialization layer before data enters business logic.
- •Normalize before validating — decode, trim, and canonicalize input before applying validation rules. Attackers exploit double-encoding and Unicode normalization differences to bypass checks.
- •Use typed validation — parse inputs into their expected types (integers, dates, enums) rather than treating everything as strings. Type coercion eliminates entire classes of injection.
- •Validate on the server side — client-side validation improves UX but provides zero security. All validation must be enforced server-side.
Validation by Data Type
| Data Type | Validation Rule | Example |
|---|---|---|
| Strings | Max length, allowed character set, regex pattern | Username: ^[a-zA-Z0-9_]{3,30}$ |
| Numbers | Type check, min/max range, integer vs. float | Age: integer, 0-150 |
| Emails | RFC 5322 format validation + DNS MX record check | user@example.com — verify domain has MX records |
| URLs | Scheme allowlist (https only), reject internal/private IPs | Allow https:// only; block file://, javascript:, RFC 1918 ranges |
| File uploads | MIME type check (magic bytes, not extension), max size, rename to random, virus scan | Accept image/png up to 5MB; rename to UUID; scan with ClamAV |
| Dates | Strict format parsing (ISO 8601), valid range check | 2024-01-15 — reject 0000-00-00, far-future dates |
Output Encoding
Output encoding must be context-specific. The correct encoding depends entirely on where the data is being rendered.
| Context | Encoding Method | Example |
|---|---|---|
| HTML body | HTML entity encode | <script> becomes <script> |
| HTML attributes | HTML attribute encode | " onclick="alert(1) becomes " onclick="alert(1) |
| JavaScript | JavaScript encode / JSON.stringify() | Embed user data only via JSON.stringify() into a variable assignment |
| URL parameters | URL encode (percent-encoding) | param=hello world becomes param=hello%20world |
| CSS | CSS hex encode | </style> becomes \003c\002fstyle\003e |
| SQL | Parameterized queries ONLY | Never encode manually for SQL — always use prepared statements |
Critical: HTML entity encoding does NOT protect against XSS in JavaScript or URL contexts. Each context requires its own encoding function.
Injection Prevention
| Attack | Root Cause | Prevention |
|---|---|---|
| SQL injection | String concatenation in SQL queries | Parameterized queries / prepared statements |
| XSS (Cross-Site Scripting) | Unencoded user data in HTML output | Context-specific output encoding; Content Security Policy (CSP) |
| Command injection | String concatenation in shell commands | Avoid shell execution; use language APIs with argument arrays |
| Path traversal | Unvalidated file paths from user input | Canonicalize paths; validate against an allowed base directory |
| LDAP injection | Unescaped user input in LDAP queries | LDAP-specific escaping functions; parameterized LDAP queries |
| XML injection | Inline construction of XML from user data | Use XML serialization libraries; disable external entities (XXE) |
Framework-Specific Helpers
Modern frameworks provide built-in protections. Understand what your framework does automatically and where gaps remain.
| Framework | Built-in Protection |
|---|---|
| React | JSX auto-escapes embedded expressions by default; beware dangerouslySetInnerHTML |
| Angular | Template expressions are auto-sanitized; beware bypassSecurityTrust* methods |
| ASP.NET | Razor views auto-encode output by default; beware Html.Raw() |
| Django | Template engine auto-escapes variables by default; beware ` |
| Spring | Thymeleaf auto-encodes with th:text; beware th:utext (unescaped) |
Warning: Every framework that auto-encodes also provides an escape hatch for raw output. Audit all uses of these escape hatches during code review.
Parameterized Queries
Parameterized queries (prepared statements) are the only reliable defense against SQL injection. Never construct SQL by concatenating user input.
SQL — Prepared Statement
-- Correct: parameterized query SELECT * FROM users WHERE email = @Email AND status = @Status; -- WRONG: string concatenation SELECT * FROM users WHERE email = '" + email + "' AND status = '" + status + "';
ORM — Entity Framework (C#)
// Correct: LINQ query — parameterized automatically
var user = context.Users
.Where(u => u.Email == email && u.Status == status)
.FirstOrDefault();
ORM — SQLAlchemy (Python)
# Correct: ORM query — parameterized automatically
user = session.query(User).filter(
User.email == email,
User.status == status
).first()
Note on Stored Procedures
Stored procedures are not inherently safe from SQL injection. If a stored procedure uses dynamic SQL internally (e.g., EXEC or sp_executesql with concatenation), it is still vulnerable. Always use parameterized inputs within stored procedures as well.
Best Practices
- •Validate all input on the server side — never rely on client-side validation for security.
- •Use allowlist validation — define what is permitted and reject everything else; denylist approaches are inherently incomplete.
- •Normalize input before validation — canonicalize Unicode, decode URL encoding, and trim whitespace before applying validation rules.
- •Encode output for its specific context — HTML, JavaScript, URL, CSS, and SQL each require different encoding; there is no universal encoder.
- •Use parameterized queries for all database access — never concatenate user input into SQL, regardless of prior validation.
- •Avoid shell execution with user input — use language-native APIs with argument arrays instead of constructing shell command strings.
- •Deploy Content Security Policy (CSP) headers — CSP provides an additional layer of defense against XSS even when encoding is missed.
- •Audit framework escape hatches — review all uses of
dangerouslySetInnerHTML,Html.Raw(),|safe,th:utext, and similar raw-output functions during code review.