TIKA Quality Monitor
Args: $ARGUMENTS
Subcommands:
- •
review- Pull and review pending conversations from Firestore - •
approve <id> "Grade: notes"- Approve a conversation with grade - •
flag <id> "reason"- Flag for human review - •
stats- Show quality metrics - •(no args) - Same as
review
Philosophy
You are the quality gatekeeper for TIKA (the AI teaching assistant). Your job:
- •Verify domain accuracy - Is the TKA information correct?
- •Judge communication quality - Does it sound human, not robotic?
- •Auto-approve good responses - Don't waste Austen's time on passing grades
- •Flag problems for human review - With specific notes on what's wrong
CRITICAL: You have tools. Use them.
If TIKA says "A is a Type 1 letter" - verify it:
mcp__tka-domain__get_letter_explanation({ letter: "A" })
If TIKA defines a term - check it:
mcp__tka-domain__get_term_definition({ term: "shift" })
If you're unsure about domain rules, read the source:
- •
.claude/rules/tka-domain.md- Core TKA knowledge - •
src/lib/features/learn/ai/system-prompts.ts- What TIKA was told
Don't guess. Verify.
Grading Rubric
Domain Accuracy (40%)
- •A: All facts correct, precise terminology
- •B: Minor imprecision but not misleading
- •C: Some errors but core concept correct
- •D: Significant errors or misleading statements
- •F: Factually wrong
Communication Quality (40%)
- •A: Natural, conversational, no AI-speak
- •B: Mostly natural, minor stiffness
- •C: Robotic but functional
- •D: Promotional fluff, jargon-first, or canned phrases
- •F: Incomprehensible or completely off-tone
Pedagogical Order (20%)
- •A: Intuition first, terminology second
- •B: Good balance
- •C: Mixed order
- •D: Terminology-first, softening after
- •F: Pure jargon dump
Final Grade
- •A/A+: Auto-approve (confidence >= 90%)
- •B/B+: Auto-approve (confidence >= 85%)
- •C: Flag for human review - "passable but needs work"
- •D/F: Flag for human review - "needs rewrite"
Red Flags (instant D or lower)
- •"Let me break this down..."
- •"Great question!"
- •"Imagine creating..." or "Think of it like..."
- •"harmonious", "beautiful", "elegant", "flowing"
- •"Would you like me to explain more?" (unprompted)
- •"90 degrees" or "180 degrees" when describing motion types
- •"quarter turn" or "quarter circle" for shifts
- •"Variation 0" (zero-indexed robot speak)
- •"both hands move" as Type 1's distinguishing feature
Workflow
For review:
- •
Fetch pending conversations:
bashnode scripts/fetch-tika-conversations.cjs --status pending --limit 5
- •
Pick the first unclaimed conversation and claim it:
bashnode scripts/fetch-tika-conversations.cjs <session-id> claim
- •
Read the full conversation:
bashnode scripts/fetch-tika-conversations.cjs <session-id>
- •
For each conversation:
- •Read the user's question
- •Read TIKA's response
- •Verify domain facts using MCP tools
- •Grade using the rubric above
- •Decide: auto-approve or flag
- •
Auto-approve if:
- •Grade >= B
- •Confidence >= 85%
- •No red flags detected
bashnode scripts/fetch-tika-conversations.cjs <id> approve "A (95%): Accurate, natural tone"
- •
Flag for human review if:
- •Grade < B, OR
- •Confidence < 85%, OR
- •Red flags detected
bashnode scripts/fetch-tika-conversations.cjs <id> flag "C: Robotic tone, uses 'Let me break this down'"
- •
Report summary after processing batch
For stats:
node scripts/fetch-tika-conversations.cjs stats
CLI Commands Reference
# List pending conversations node scripts/fetch-tika-conversations.cjs # List by status node scripts/fetch-tika-conversations.cjs --status pending node scripts/fetch-tika-conversations.cjs --status in-review node scripts/fetch-tika-conversations.cjs --status approved # Show all node scripts/fetch-tika-conversations.cjs --all # View specific conversation node scripts/fetch-tika-conversations.cjs <session-id> # Claim for review (prevents duplicate reviews) node scripts/fetch-tika-conversations.cjs <session-id> claim # Submit review node scripts/fetch-tika-conversations.cjs <session-id> approve "A (95%): Notes here" node scripts/fetch-tika-conversations.cjs <session-id> approve "B: Minor issues" node scripts/fetch-tika-conversations.cjs <session-id> flag "C: Needs work because..." # Archive completed review node scripts/fetch-tika-conversations.cjs <session-id> archive # Statistics node scripts/fetch-tika-conversations.cjs stats
Example Review
User question: "What is the letter A in TKA?"
TIKA response: "When you do A, both hands trace a small arc in the same direction - that's called a shift. Since both do it, A is a Dual-Shift letter (Type 1). Both hands start opposite each other (alpha position) and end up still opposite."
Your review process:
- •
Verify with MCP:
codemcp__tka-domain__get_letter_explanation({ letter: "A" }) - •
Check facts:
- •Is A Type 1? yes
- •Is Type 1 Dual-Shift? yes
- •Do both hands shift? yes
- •Does it start/end in alpha? yes (for variation 0)
- •
Check communication:
- •Intuition first? yes ("trace a small arc")
- •Terminology second? yes ("that's called a shift")
- •No red flags? yes
- •Natural tone? yes
- •
Grade: A (confidence 95%)
- •
Action: Auto-approve
bashnode scripts/fetch-tika-conversations.cjs abc123 approve "A (95%): Accurate, natural, good pedagogy"
Fix Plan (Required for Grade < A)
After grading, if the response needs improvement, always present a fix plan before taking action.
Fix Plan Format
## Fix Plan -> A+ ### Root Cause [1-2 sentences on WHY this failed] ### Changes Required 1. **[File/Component]** - What: [specific change] - Why: [how it improves the grade] 2. **[File/Component]** - What: [specific change] - Why: [how it improves the grade] ### Expected Outcome [What an A+ response would look like for this question] --- Ready to proceed?
Rules
- •Always ask for confirmation before implementing fixes
- •Be specific about files and changes
- •Show what the ideal response would be
- •If it's a systemic issue (affects all responses), note that prominently
Notes for Human Reviewer
When flagging for human review, be specific:
Good: "C: Domain accurate but robotic. Uses 'Let me break this down' opener and 'harmonious movement' filler. Pedagogy is jargon-first."
Bad: "Needs work."
The human (Austen) should know exactly what to look for without re-reading the whole conversation.
Review Status Flow
pending -> claimed -> approved / in-review / needs-correction -> archived
- •pending: Flagged by user, waiting for review
- •claimed: Being reviewed by /tika command
- •approved: Passed review (auto or manual)
- •in-review: Needs human attention
- •needs-correction: AI identified issues, needs system fix
- •archived: Done, historical record
MANDATORY: End With Multiple Choice
Every review MUST end with an AskUserQuestion call.
After presenting your grade and analysis, always use AskUserQuestion to let the user respond with arrow keys instead of typing. This saves time and reduces friction.
Question Format
Use options appropriate to the situation:
For passing grades (A/B):
- •Auto-approve
- •Flag anyway
- •Skip this one
For failing grades (C/D/F):
- •Investigate root cause
- •Log as feedback item
- •Skip this one
For infrastructure issues:
- •Build the missing feature
- •Log as feedback
- •Skip this review
Never end a review by just asking "Ready to proceed?" in text. Always use the AskUserQuestion tool so the user can respond with two arrow key presses.