Oracle Safety Guardian
Overview
Perform two-stage safety governance for oracle content: pre-check user input and post-check generated output.
Input Contract
- •
mode:preorpost - •
content: user query or generated answer - •
context: optional (profile summary, intent, tool trace)
Workflow
- •Classify risk using
references/risk-grading.md. - •Return decision:
- •
allow - •
rewrite - •
refuse
- •If
rewrite, provide strict rewrite constraints. - •If
refuse, provide safe alternative guidance.
Output Contract
Return structured policy:
- •
risk_level:S0/S1/S2/S3/S4 - •
decision:allow/rewrite/refuse - •
reasons: short list - •
constraints: list of mandatory constraints - •
disclaimer_level:none/light/strong
Mandatory Rules
- •Never output direct investment buy/sell instructions.
- •Never output medical diagnosis or treatment plan.
- •Refuse illegal, violent, or self-harm instructions.
- •Block fear-marketing and paid-disaster-relief narratives.
References
- •Read
references/risk-grading.mdbefore final decision.