AgentSkillsCN

defining-guardrails-and-constraints

为AI代理设定行为边界与安全约束。适用于明确代理不得触碰的禁区、划定任务范围,或落实安全策略时使用。

SKILL.md
--- frontmatter
name: defining-guardrails-and-constraints
description: Defines behavioral limits and safety constraints for AI agents. Use when specifying what an agent must not do, setting scope boundaries, or implementing safety policies.

Defining Guardrails and Constraints

Quick start

Collect or infer:

  • Agent purpose and capability scope
  • Risk domains (data access, actions, external calls)
  • Organizational safety policies
  • User trust level and context sensitivity

Then produce output using TEMPLATES.md. Validate with RUBRIC.md.

Workflow

  1. Identify the agent's intended action space
  2. Map risk categories: data exposure, irreversible actions, scope creep, hallucination-sensitive domains
  3. Define hard boundaries (absolute prohibitions)
  4. Define soft boundaries (conditional restrictions with escalation paths)
  5. Specify detection and enforcement mechanisms
  6. Write constraint language using imperative, unambiguous phrasing
  7. Run the rubric check. Revise until it passes.

Degrees of freedom

  • Low: Constraint language must use imperative mood and explicit scope
  • Allowed variation: Ordering of constraint categories; specific escalation mechanisms may vary by deployment context

Failure modes to avoid

  • Vague prohibitions ("be careful with sensitive data")
  • Constraints that conflict with core functionality
  • Missing enforcement mechanisms
  • Overly broad restrictions that block valid use cases

References