Auditing Agent Behavior
Quick start
Collect or infer:
- •Agent specifications (instructions, guardrails, expected behaviors)
- •Sample of agent interactions to audit
- •Success criteria and failure definitions
- •Audit scope (specific capabilities, time range, user segment)
Then produce output using TEMPLATES.md. Validate with RUBRIC.md.
Workflow
- •Define audit scope and sampling strategy
- •Establish evaluation criteria from agent specifications
- •Collect representative interaction samples
- •Categorize behaviors: correct, incorrect, edge case, unsafe
- •Identify patterns in failures and near-misses
- •Document findings with severity and frequency
- •Recommend specific remediation actions
- •Run the rubric check. Revise until it passes.
Degrees of freedom
- •Medium: Audit depth and sampling strategy may vary based on risk level
- •Allowed variation: Categorization schemes; specific metrics tracked
Failure modes to avoid
- •Auditing without clear success criteria
- •Sampling bias (only reviewing flagged interactions)
- •Focusing on edge cases while missing systemic issues
- •Recommendations without actionable specificity
References
- •Templates: TEMPLATES.md
- •Rubric: RUBRIC.md
- •Examples: EXAMPLES.md
- •Audit categories: reference/audit-categories.md