Cost Tracking Framework
When This Activates
This skill activates when:
- •User asks about API costs or spending
- •Concerns about expensive operations
- •Need to optimize token usage
Token Cost Reference
Claude Pricing (Approximate)
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Opus | ~$15 | ~$75 |
| Sonnet | ~$3 | ~$15 |
| Haiku | ~$0.25 | ~$1.25 |
Typical Operation Costs
| Operation | Tokens | Approximate Cost |
|---|---|---|
| Simple question | 500-2K | $0.01-0.05 |
| File read + analysis | 2-10K | $0.05-0.25 |
| Code generation | 5-20K | $0.15-0.50 |
| Multi-file refactor | 20-100K | $0.50-2.50 |
| Long conversation | 50-200K | $1.00-5.00 |
Cost Optimization Strategies
1. Route to Local LLM (FREE)
Use local_ask for simple tasks:
code
# FREE - no API cost local_ask question="where is the login function?" local_ask question="explain this error" mode=explain local_review file="src/auth.ts" focus=bugs
Good for local:
- •Simple lookups ("where is X?")
- •Code explanations
- •Commit message generation
- •Quick code reviews
2. Use Memory Tools First
Pre-indexed memory is instant and free:
code
# Instant, no API cost memory_query "authentication flow" memory_functions name="handleLogin" smart_read path="src/auth.ts" detail=summary
3. Reduce Context Size
- •Use
smart_readwithdetail=summarybeforedetail=full - •Truncate large files to relevant sections
- •Clear conversation when changing topics
4. Batch Related Questions
Instead of 5 separate messages, combine:
code
"Can you: 1) explain the auth flow, 2) find the login component, 3) check for security issues, and 4) suggest improvements?"
Gateway Metrics
Check current efficiency:
code
gateway_metrics format=summary
Returns:
- •Cache hit rate
- •Token savings
- •Routing breakdown (local vs API)
Cost Estimation
Before expensive operations:
code
This refactor will touch ~20 files. Estimated cost: $0.50-1.00 Proceed? [Y/n]
Budget Awareness
Daily Patterns
- •Morning: Fresh context, lower cost
- •Long sessions: Context grows, higher cost
- •After compaction: Reset context, lower cost
High-Cost Triggers
- •"Analyze entire codebase"
- •"Review all files in directory"
- •"Generate comprehensive documentation"
- •Very long conversations (>50 turns)
Saving Tips
- •Start fresh for new topics - Don't carry irrelevant context
- •Use subagents - They have focused context
- •Check memory first - Summaries save full file reads
- •Compress transcripts - Archived sessions are compressed
- •Local for simple tasks - Ollama is free
Monitoring Commands
bash
# Check gateway efficiency python3 ~/.claude-dash/learning/efficiency_tracker.py --report # View session sizes du -sh ~/.claude-dash/sessions/*