Performance Diagnostician Skill
Purpose: Automatically diagnose performance issues in the Manny MCP codebase and suggest specific fixes or optimizations.
When to use:
- •User reports "slow" performance
- •Builds taking too long
- •Low cache hit rates
- •After implementing new features (check for regressions)
- •Proactively during long-running sessions
- •When asked about optimization status
Execution Steps
1. Run Full Performance Scan
bash
./monitor.py --metric all
Capture ALL output - you'll interpret this.
2. Analyze Each Metric
Build Performance
- •Expected: Incremental 25-35s, Clean 30-40s
- •Alert threshold: >60s incremental, >70s clean
- •If SLOW:
- •Check if offline mode (
-o) is enabled in server.py:377 - •Check thread count (
-T 2) matches CPU cores - •Suggest:
rm -rf ~/.m2/repositoryif Maven cache corrupted - •Suggest: Force clean build once to reset state
- •Consider: Phase 4B build artifact caching (CONFIGURATION.md)
- •Check if offline mode (
Cache Performance
- •Expected: Hit rate >50%
- •Alert threshold: <30%
- •If LOW:
- •Cache may be too small (increase
max_sizein cache_layer.py) - •TTL may be too short (increase
default_ttl) - •Cache may have been cleared recently (normal)
- •If persistent: code is changing frequently, consider longer TTL
- •Check: Run cache stats to see what's being cached
- •Cache may be too small (increase
Search Engine
- •Expected: Build <1s, total keys ~1,287
- •Alert threshold: >5s build time
- •If SLOW:
- •Index may need rebuild
- •File count may have increased significantly
- •Check: Number of Java files (should be ~43)
- •Suggest: Force rebuild with
import search_engine; search_engine._search_index = None
Command Latency
- •Expected: <10ms average
- •Alert threshold: >100ms
- •If SLOW:
- •Event-driven file monitoring may have failed (fallback to polling)
- •Check: Is watchdog installed? (
pip list | grep watchdog) - •Check: Response file permissions
- •Suggest: Restart MCP server
- •Consider: Phase 4A optimizations (orjson, WebSocket push)
State File Writes
- •Expected: 6-10/min
- •Alert threshold: >20/min
- •If HIGH:
- •State change detection may not be working
- •Check: GameEngine.java:4600-4785 (hasSignificantChange)
- •Player may be very active (normal for combat/skilling)
- •Consider: Increase
unchangedTicksthreshold (currently 10)
3. Check for Regressions
Compare current metrics to baselines in OPTIMIZATIONS.md:
- •Builds: Should be ~10x faster than pre-optimization (was 5min)
- •Commands: Should be ~500x faster (was 500ms)
- •Searches: Should be ~1000x faster (was 50ms)
If any metric is regressed:
- •Something broke the optimization
- •Check git history for recent changes
- •Suggest rollback procedures from OPTIMIZATION_QUICK_REFERENCE.md
4. Suggest Optimizations
Based on findings:
If builds are slow but acceptable (35-50s):
- •Mention Phase 4B build artifact caching could help
- •Show OPTIMIZATION_PHASE4_AUDIT.md lines 157-202
If cache hit rate is low but not critical (30-40%):
- •Suggest tuning TTL or max_size
- •Show CONFIGURATION.md lines 67-148
If dashboard polling is visible in logs:
- •Mention Phase 4A could reduce dashboard overhead by 90%
- •Show OPTIMIZATION_PHASE4_AUDIT.md lines 21-75
If everything is healthy:
- •Report: "All optimizations working as expected ✅"
- •Show summary of current performance vs baseline
- •Mention: Phase 4 optimizations available if needed
5. Provide Actionable Fix
Don't just report problems - provide the exact fix:
Example: Slow builds
bash
# Fix 1: Check offline mode is enabled grep "\-o" /home/wil/manny-mcp/server.py # Fix 2: Clear Maven cache rm -rf ~/.m2/repository # Fix 3: Force clean build ./venv/bin/python -c "from server import build_plugin; build_plugin(clean=True)" # Fix 4: Verify thread count matches CPU # Edit server.py:377, change -T to match your CPU cores (max 8)
Example: Low cache hit rate
python
# Fix: Increase TTL in cache_layer.py:158 _tool_cache = LRUCache(max_size=100, default_ttl=600) # Was 300, now 600 (10 min) # Then restart MCP server
Example: Missing search index
python
# Force rebuild search index
./venv/bin/python -c "
import search_engine
search_engine._search_index = None
from search_engine import get_search_index
index = get_search_index('/home/wil/Desktop/manny')
print(f'Index rebuilt: {index.get_stats()}')
"
Output Format
Summary Section
code
Performance Diagnostics Report ============================== Status: [✅ Healthy | ⚠️ Issues Found | ❌ Critical Issues] Metrics Summary: - Builds: [time]s (✅ OK | ⚠️ SLOW) - Cache Hit Rate: [rate]% (✅ OK | ⚠️ LOW) - Search Engine: [time]s (✅ OK | ⚠️ SLOW) - Commands: [time]ms (✅ OK | ⚠️ SLOW) - State Writes: [count]/min (✅ OK | ⚠️ HIGH)
Issues Section (if any)
code
Issues Detected: 1. [Issue name] - [severity] Cause: [explanation] Fix: [specific command or code change] 2. [Issue name] - [severity] ...
Recommendations Section
code
Optimization Opportunities: - [Current state is optimal] OR - [Suggest specific Phase 4 optimizations if beneficial] - [Reference specific documentation sections]
Cross-References
Use these docs for detailed guidance:
- •OPTIMIZATION_QUICK_REFERENCE.md - Troubleshooting recipes (start here!)
- •OPTIMIZATIONS.md - Phase 1-3 baselines and architecture
- •CONFIGURATION.md - Settings tuning for specific issues
- •OPTIMIZATION_PHASE4_AUDIT.md - Future optimization suggestions
- •OPTIMIZATION_ROADMAP.md - Meta-view of all phases
Proactive Monitoring
If invoked during a long session without user request:
- •Run quick health check:
./monitor.py --metric all - •Only report if issues found
- •Be concise - user is overseeing, not managing details
- •Format as: "Performance check: [✅ All healthy | ⚠️ Found X issues]"
Example Invocations
User says: "Builds seem slow"
code
1. Run: ./monitor.py --metric builds 2. Check output against baseline (30s expected) 3. If >60s: Diagnose (offline mode? Maven cache? Thread count?) 4. Provide specific fix commands 5. Explain expected result after fix
User says: "Is everything optimized?"
code
1. Run: ./monitor.py --metric all 2. Compare all metrics to baselines 3. Report: "Phases 1-3 complete and performing as expected" 4. Mention: "Phase 4 optimizations available (see OPTIMIZATION_PHASE4_AUDIT.md)" 5. Show quick wins if user wants more speed
User asks: "Why is the cache not working?"
code
1. Run: ./monitor.py --metric cache 2. Interpret hit rate and size 3. Check if cache is being populated (size > 0?) 4. Suggest: Increase TTL or max_size if hit rate low 5. Provide exact file path and line number to edit
Proactive (long session, no user request)
code
Every 30-60 minutes of activity: 1. Quick check: ./monitor.py --metric all 2. If all ✅: Stay silent (don't spam user) 3. If any ⚠️: Report concisely: "Performance check: Cache hit rate dropped to 25% (was 50%)" 4. Offer: "Should I diagnose and suggest a fix?"
Important Notes
- •Always run monitor.py - Don't guess based on documentation
- •Provide exact commands - User is overseeing, not debugging
- •Reference line numbers - e.g., "server.py:377" not "the build function"
- •Be actionable - Every problem needs a specific fix
- •Know when to escalate - If critical regression, alert clearly
Success Criteria
✅ Skill is successful when:
- •Performance issues are identified automatically
- •Root cause is explained clearly
- •Specific fix is provided (copy-paste ready)
- •User can verify fix worked with monitor.py
- •No manual debugging required by user
❌ Skill fails when:
- •Reports issues without running monitor.py
- •Suggests "check the docs" without specific guidance
- •Provides vague fixes ("optimize the code")
- •Doesn't verify current state before suggesting changes