AgentSkillsCN

hygiene

适用于贯彻防御性编程实践,将所有数据视为不可信——无论其来源如何。涵盖对输入与输出在各组件边界上的净化、规范化、编码与校验,包括内部数据库、缓存、消息队列以及服务间调用。 适用场景:数据卫生、净化、规范化、标准化、输出编码、信任边界、纵深防御、零信任数据处理、内部数据校验、组件边界安全。 不适用场景:特定的注入攻击模式(应使用输入校验或 OWASP 标准)、加密控制措施(应使用密码学相关技能)、API 级别的安全头信息(应使用 API 安全相关技能)。

SKILL.md
--- frontmatter
name: hygiene
description: |
    Use when enforcing defensive coding practices that treat all data as untrusted — regardless of source. Covers sanitization, canonicalization, encoding, and validation of inputs AND outputs at every component boundary, including internal databases, caches, message queues, and inter-service calls.
    USE FOR: data hygiene, sanitization, canonicalization, normalization, output encoding, trust boundaries, defense in depth, zero trust data handling, internal data validation, component boundary security
    DO NOT USE FOR: specific injection attack patterns (use input-validation or owasp), cryptographic controls (use cryptography), API-level security headers (use api-security)
license: MIT
metadata:
  displayName: "Hygiene"
  author: "Tyler-R-Kendrick"
compatibility: claude, copilot, cursor

Hygiene — Trust No Data

Overview

Security hygiene is the discipline of treating all data as untrusted at every component boundary — not just data from end users. Internal databases, caches, message queues, partner APIs, configuration stores, and even your own microservices can deliver data that is malformed, stale, tampered with, or injected. A compromised internal system, a poisoned cache entry, or a corrupted database row can be just as dangerous as a malicious HTTP request.

The core principle: sanitize, validate, and encode at the boundary of every component, not just at the perimeter.

"The Jenga tower of trust collapses when any single block is compromised. Don't assume the block below you is sound — verify it."

The Trust Boundary Model

Every component has boundaries where data enters and exits. Each boundary is a point where hygiene must be enforced.

code
                    ┌─────────────────────┐
  User Input ──────►│                     │──────► Database
                    │    Your Component   │
  Database ────────►│                     │──────► API Response
                    │   SANITIZE HERE     │
  Message Queue ───►│   ON EVERY EDGE     │──────► Message Queue
                    │                     │
  Cache ───────────►│                     │──────► Logs
                    │                     │
  Partner API ─────►│                     │──────► Downstream Service
                    └─────────────────────┘

Why Internal Sources Are Untrusted

SourceWhy It Can't Be TrustedExample Threat
DatabaseRows may have been inserted by a compromised service, SQL injection, or legacy code without validationStored XSS: HTML in a database field rendered without encoding
Cache (Redis, Memcached)Cache poisoning, stale data, no authentication by defaultAttacker writes malicious payload to shared cache key
Message QueueAny producer can publish; messages may be replayed or tamperedDeserialization attack via crafted message payload
Internal API / MicroserviceService may be compromised, misconfigured, or returning unexpected dataDownstream service returns user-controlled data without sanitization
Configuration StoreConfig values may be modified by operators or injected via environmentCommand injection through a config value used in a shell call
File SystemFiles may be written by other processes, users, or symlink attacksPath traversal via a filename read from a temp directory
Logs (when read back)Log entries may contain injected contentLog injection leading to SIEM manipulation or log4shell-style attacks

Sanitization

Sanitization removes or neutralizes dangerous content from data. It should happen on input to your component, before the data is processed or stored.

Strategies

StrategyWhat It DoesWhen to Use
StrippingRemoves disallowed characters or tags entirelyHTML tags from plain-text fields, control characters from strings
EscapingReplaces special characters with safe equivalentsHTML entities, SQL parameterization, shell escaping
AllowlistingAccepts only values matching a known-good patternEnum fields, expected formats (UUID, ISO date, email)
Type coercionConverts data to the expected typeParse string to integer, parse to date object

Sanitize Both Directions

  • Inbound: Validate and sanitize data entering your component from any source.
  • Outbound: Encode data leaving your component for the target context (HTML, SQL, shell, URL, JSON, XML).
code
  Inbound                           Outbound
  ────────                          ────────
  Validate type/format              Encode for HTML context
  Enforce allowlist                 Encode for SQL (parameterize)
  Strip/escape dangerous chars      Encode for shell (avoid shell if possible)
  Normalize/canonicalize            Encode for URL
  Reject if invalid                 Encode for JSON/XML

Canonicalization

Canonicalization (C14N) converts data to its simplest, standard form before validation. Without canonicalization, attackers bypass validation by using alternate representations of the same value.

Common Bypass Techniques Prevented by Canonicalization

TechniqueExampleWhat Canonicalization Does
Unicode normalization\uFF41 (fullwidth 'a') vs aNormalize to NFC/NFD before comparing
URL encoding%2e%2e%2f = ../Decode URL encoding before path validation
Double encoding%252e%2e.Decode iteratively until stable
Case variationSELECT vs SeLeCtLowercase before matching keywords
Path normalization/app/../etc/passwdResolve to canonical absolute path (/etc/passwd)
Null bytesfile.php%00.jpgStrip null bytes before extension check
Homoglyph substitutionpаypal.com (Cyrillic 'а')Normalize to ASCII or punycode
Whitespace injectionadmin\t vs adminTrim and normalize whitespace

Canonicalization Order

Always canonicalize before validation:

code
Raw Input → Decode → Normalize → Canonicalize → Validate → Process

If you validate before canonicalizing, an attacker can encode their payload to pass validation and then have it decoded into a malicious form later.

Language Examples

Path Canonicalization

csharp
// C# — Resolve to absolute path and verify it's within the allowed directory
string basePath = Path.GetFullPath("/app/uploads");
string requested = Path.GetFullPath(Path.Combine(basePath, userInput));
if (!requested.StartsWith(basePath + Path.DirectorySeparatorChar))
    throw new SecurityException("Path traversal detected");
python
# Python — Resolve symlinks and relative segments
import os
base = os.path.realpath("/app/uploads")
requested = os.path.realpath(os.path.join(base, user_input))
if not requested.startswith(base + os.sep):
    raise ValueError("Path traversal detected")

Unicode Normalization

python
import unicodedata

# Normalize to NFC before comparing or validating
clean = unicodedata.normalize("NFC", user_input)
csharp
// C# — Normalize to FormC (NFC)
string clean = userInput.Normalize(NormalizationForm.FormC);

URL Canonicalization

python
from urllib.parse import unquote

def canonicalize_url(url: str) -> str:
    prev = None
    while prev != url:
        prev = url
        url = unquote(url)
    return url

Output Encoding

Output encoding ensures data is safe for the specific context it is being rendered or transmitted into. This applies to ALL outputs, not just HTTP responses.

Context-Specific Encoding

Output ContextEncoding RequiredWhy
HTML bodyHTML entity encoding (<&lt;)Prevents XSS
HTML attributesAttribute encoding (quote special chars)Prevents attribute injection
JavaScriptJS string encoding or JSON.stringify()Prevents script injection
SQLParameterized queries (never string concat)Prevents SQL injection
Shell commandsAvoid shell entirely; if unavoidable, use library escapingPrevents command injection
URLsURL-encode query parameters and path segmentsPrevents parameter injection
XMLXML entity encodingPrevents XXE and injection
JSONProper serialization (not string concatenation)Prevents JSON injection
Log entriesStrip newlines and control charactersPrevents log injection
Email headersStrip newlines from header valuesPrevents header injection

Encoding Data From Internal Sources

code
Database row → render in HTML?    → HTML-encode the value
Cache value  → insert into SQL?   → use parameterized query
Queue msg    → pass to shell?     → use subprocess array (no shell)
Config value → embed in template? → encode for template context
API response → return to client?  → serialize properly, set Content-Type

The encoding must match the destination context, regardless of where the data came from.

Component Boundary Checklist

For every component boundary (inbound or outbound), verify:

#CheckApplies To
1Type validation — Is the data the expected type?All inputs
2Format validation — Does it match the expected pattern?Strings, IDs, dates, emails
3Range validation — Is it within acceptable bounds?Numbers, dates, string lengths
4Canonicalization — Is the data in its simplest canonical form?Paths, URLs, Unicode strings
5Allowlist check — Is the value in the set of permitted values?Enums, status codes, roles
6Sanitization — Have dangerous characters been removed or escaped?All text inputs
7Output encoding — Is data encoded for the destination context?All outputs
8Size limits — Is the data within acceptable size bounds?Files, payloads, strings

Anti-Patterns

Anti-PatternWhy It's DangerousCorrect Approach
"It's from our database, so it's safe"Database can contain injected content from any entry pointEncode database values for the output context
"We validated it on the way in, so it's clean"Output context may differ from input context; data may be modified in transitEncode for the specific output context at point of use
"It's an internal API, so we trust it"Internal services can be compromised or misconfiguredValidate response schema and sanitize values
"We use an ORM, so SQL injection is impossible"Raw queries, dynamic column names, and ORDER BY bypass ORM protectionsAlways parameterize; review ORM escape hatches
"The cache just stores what we put in"Cache can be poisoned via shared keys, race conditions, or unauthenticated accessValidate and encode cache values on read
"Config values are set by operators"Environment variables and config stores can be compromisedValidate config values at startup; never pass raw to shell or template

Best Practices

  • Treat all data as untrusted regardless of source — user input, databases, caches, queues, APIs, config, and files all deserve the same scrutiny.
  • Canonicalize before validating — decode, normalize, and resolve data to its simplest form before applying security checks.
  • Encode at the point of use, not at the point of storage — the correct encoding depends on where the data is going, not where it came from.
  • Validate on both sides of every boundary — the producer should validate what it sends and the consumer should validate what it receives.
  • Use allowlists over denylists — it is safer to define what is permitted than to try to enumerate everything that is dangerous.
  • Fail closed — if data fails validation, reject it entirely rather than attempting to "fix" it and proceeding.
  • Apply the same rigor to outbound data — log injection, email header injection, and downstream API injection are just as real as XSS.
  • Automate hygiene checks — use linters, SAST rules (Semgrep, CodeQL), and code review checklists to catch missing validation and encoding at component boundaries.
  • Document trust boundaries — make them explicit in architecture diagrams and threat models so every developer knows where hygiene enforcement is required.
  • Review ORM and framework escape hatchesdangerouslySetInnerHTML, raw SQL, HtmlString, @Html.Raw(), mark_safe(), and similar constructs bypass built-in protections and require manual encoding.