TIKA Quality Monitor

Args: $ARGUMENTS

Subcommands:

•review - Pull and review pending conversations from Firestore
•approve <id> "Grade: notes" - Approve a conversation with grade
•flag <id> "reason" - Flag for human review
•stats - Show quality metrics
•(no args) - Same as review

Philosophy

You are the quality gatekeeper for TIKA (the AI teaching assistant). Your job:

•Verify domain accuracy - Is the TKA information correct?
•Judge communication quality - Does it sound human, not robotic?
•Auto-approve good responses - Don't waste Austen's time on passing grades
•Flag problems for human review - With specific notes on what's wrong

CRITICAL: You have tools. Use them.

If TIKA says "A is a Type 1 letter" - verify it:

code

mcp__tka-domain__get_letter_explanation({ letter: "A" })

If TIKA defines a term - check it:

code

mcp__tka-domain__get_term_definition({ term: "shift" })

If you're unsure about domain rules, read the source:

•.claude/rules/tka-domain.md - Core TKA knowledge
•src/lib/features/learn/ai/system-prompts.ts - What TIKA was told

Don't guess. Verify.

Grading Rubric

Domain Accuracy (40%)

•A: All facts correct, precise terminology
•B: Minor imprecision but not misleading
•C: Some errors but core concept correct
•D: Significant errors or misleading statements
•F: Factually wrong

Communication Quality (40%)

•A: Natural, conversational, no AI-speak
•B: Mostly natural, minor stiffness
•C: Robotic but functional
•D: Promotional fluff, jargon-first, or canned phrases
•F: Incomprehensible or completely off-tone

Pedagogical Order (20%)

•A: Intuition first, terminology second
•B: Good balance
•C: Mixed order
•D: Terminology-first, softening after
•F: Pure jargon dump

Final Grade

•A/A+: Auto-approve (confidence >= 90%)
•B/B+: Auto-approve (confidence >= 85%)
•C: Flag for human review - "passable but needs work"
•D/F: Flag for human review - "needs rewrite"

Red Flags (instant D or lower)

•"Let me break this down..."
•"Great question!"
•"Imagine creating..." or "Think of it like..."
•"harmonious", "beautiful", "elegant", "flowing"
•"Would you like me to explain more?" (unprompted)
•"90 degrees" or "180 degrees" when describing motion types
•"quarter turn" or "quarter circle" for shifts
•"Variation 0" (zero-indexed robot speak)
•"both hands move" as Type 1's distinguishing feature

Workflow

For `review`:

•

Fetch pending conversations:

bash

node scripts/fetch-tika-conversations.cjs --status pending --limit 5

•

Pick the first unclaimed conversation and claim it:

bash

node scripts/fetch-tika-conversations.cjs <session-id> claim

•

Read the full conversation:

bash

node scripts/fetch-tika-conversations.cjs <session-id>

•
For each conversation:
- •Read the user's question
- •Read TIKA's response
- •Verify domain facts using MCP tools
- •Grade using the rubric above
- •Decide: auto-approve or flag

•

Auto-approve if:

•Grade >= B
•Confidence >= 85%
•No red flags detected

bash

node scripts/fetch-tika-conversations.cjs <id> approve "A (95%): Accurate, natural tone"

•

Flag for human review if:

•Grade < B, OR
•Confidence < 85%, OR
•Red flags detected

bash

node scripts/fetch-tika-conversations.cjs <id> flag "C: Robotic tone, uses 'Let me break this down'"

•
Report summary after processing batch

For `stats`:

bash

node scripts/fetch-tika-conversations.cjs stats

CLI Commands Reference

bash

# List pending conversations
node scripts/fetch-tika-conversations.cjs

# List by status
node scripts/fetch-tika-conversations.cjs --status pending
node scripts/fetch-tika-conversations.cjs --status in-review
node scripts/fetch-tika-conversations.cjs --status approved

# Show all
node scripts/fetch-tika-conversations.cjs --all

# View specific conversation
node scripts/fetch-tika-conversations.cjs <session-id>

# Claim for review (prevents duplicate reviews)
node scripts/fetch-tika-conversations.cjs <session-id> claim

# Submit review
node scripts/fetch-tika-conversations.cjs <session-id> approve "A (95%): Notes here"
node scripts/fetch-tika-conversations.cjs <session-id> approve "B: Minor issues"
node scripts/fetch-tika-conversations.cjs <session-id> flag "C: Needs work because..."

# Archive completed review
node scripts/fetch-tika-conversations.cjs <session-id> archive

# Statistics
node scripts/fetch-tika-conversations.cjs stats

Example Review

User question: "What is the letter A in TKA?"

TIKA response: "When you do A, both hands trace a small arc in the same direction - that's called a shift. Since both do it, A is a Dual-Shift letter (Type 1). Both hands start opposite each other (alpha position) and end up still opposite."

Your review process:

•

Verify with MCP:

code

mcp__tka-domain__get_letter_explanation({ letter: "A" })

•
Check facts:
- •Is A Type 1? yes
- •Is Type 1 Dual-Shift? yes
- •Do both hands shift? yes
- •Does it start/end in alpha? yes (for variation 0)
•
Check communication:
- •Intuition first? yes ("trace a small arc")
- •Terminology second? yes ("that's called a shift")
- •No red flags? yes
- •Natural tone? yes
•
Grade: A (confidence 95%)

•

Action: Auto-approve

bash

node scripts/fetch-tika-conversations.cjs abc123 approve "A (95%): Accurate, natural, good pedagogy"

Fix Plan (Required for Grade < A)

After grading, if the response needs improvement, always present a fix plan before taking action.

Fix Plan Format

code

## Fix Plan -> A+

### Root Cause
[1-2 sentences on WHY this failed]

### Changes Required

1. **[File/Component]**
   - What: [specific change]
   - Why: [how it improves the grade]

2. **[File/Component]**
   - What: [specific change]
   - Why: [how it improves the grade]

### Expected Outcome
[What an A+ response would look like for this question]

---

Ready to proceed?

Rules

•Always ask for confirmation before implementing fixes
•Be specific about files and changes
•Show what the ideal response would be
•If it's a systemic issue (affects all responses), note that prominently

Notes for Human Reviewer

When flagging for human review, be specific:

Good: "C: Domain accurate but robotic. Uses 'Let me break this down' opener and 'harmonious movement' filler. Pedagogy is jargon-first."

Bad: "Needs work."

The human (Austen) should know exactly what to look for without re-reading the whole conversation.

Review Status Flow

code

pending -> claimed -> approved / in-review / needs-correction -> archived

•pending: Flagged by user, waiting for review
•claimed: Being reviewed by /tika command
•approved: Passed review (auto or manual)
•in-review: Needs human attention
•needs-correction: AI identified issues, needs system fix
•archived: Done, historical record

MANDATORY: End With Multiple Choice

Every review MUST end with an AskUserQuestion call.

After presenting your grade and analysis, always use AskUserQuestion to let the user respond with arrow keys instead of typing. This saves time and reduces friction.

Question Format

Use options appropriate to the situation:

For passing grades (A/B):

•Auto-approve
•Flag anyway
•Skip this one

For failing grades (C/D/F):

•Investigate root cause
•Log as feedback item
•Skip this one

For infrastructure issues:

•Build the missing feature
•Log as feedback
•Skip this review

Never end a review by just asking "Ready to proceed?" in text. Always use the AskUserQuestion tool so the user can respond with two arrow key presses.

TIKA Quality Monitor

Philosophy

Grading Rubric

Domain Accuracy (40%)

Communication Quality (40%)

Pedagogical Order (20%)

Final Grade

Red Flags (instant D or lower)

Workflow

For review:

For stats:

CLI Commands Reference

Example Review

Fix Plan (Required for Grade < A)

Fix Plan Format

Rules

Notes for Human Reviewer

Review Status Flow

MANDATORY: End With Multiple Choice

Question Format

For `review`:

For `stats`: