Defining Guardrails and Constraints
Quick start
Collect or infer:
- •Agent purpose and capability scope
- •Risk domains (data access, actions, external calls)
- •Organizational safety policies
- •User trust level and context sensitivity
Then produce output using TEMPLATES.md. Validate with RUBRIC.md.
Workflow
- •Identify the agent's intended action space
- •Map risk categories: data exposure, irreversible actions, scope creep, hallucination-sensitive domains
- •Define hard boundaries (absolute prohibitions)
- •Define soft boundaries (conditional restrictions with escalation paths)
- •Specify detection and enforcement mechanisms
- •Write constraint language using imperative, unambiguous phrasing
- •Run the rubric check. Revise until it passes.
Degrees of freedom
- •Low: Constraint language must use imperative mood and explicit scope
- •Allowed variation: Ordering of constraint categories; specific escalation mechanisms may vary by deployment context
Failure modes to avoid
- •Vague prohibitions ("be careful with sensitive data")
- •Constraints that conflict with core functionality
- •Missing enforcement mechanisms
- •Overly broad restrictions that block valid use cases
References
- •Templates: TEMPLATES.md
- •Rubric: RUBRIC.md
- •Examples: EXAMPLES.md
- •Constraint taxonomy: reference/constraint-taxonomy.md
- •Enforcement patterns: reference/enforcement-patterns.md