AgentSkillsCN

context-engineering

检查上下文使用限制、监控剩余时间、优化令牌消耗、调试上下文故障。适用于询问上下文百分比、速率限制、使用警告、上下文优化、代理架构与内存系统时。

SKILL.md
--- frontmatter
name: context-engineering
description: >-
  Check context usage limits, monitor time remaining, optimize token consumption, debug context failures.
  Use when asking about context percentage, rate limits, usage warnings, context optimization, agent architectures, memory systems.
version: 1.0.0

Context Engineering

Context engineering curates the smallest high-signal token set for LLM tasks. The goal: maximize reasoning quality while minimizing token usage.

When to Activate

  • Designing/debugging agent systems
  • Context limits constrain performance
  • Optimizing cost/latency
  • Building multi-agent coordination
  • Implementing memory systems
  • Evaluating agent performance
  • Developing LLM-powered pipelines

Core Principles

  1. Context quality > quantity - High-signal tokens beat exhaustive content
  2. Attention is finite - U-shaped curve favors beginning/end positions
  3. Progressive disclosure - Load information just-in-time
  4. Isolation prevents degradation - Partition work across sub-agents
  5. Measure before optimizing - Know your baseline

IMPORTANT:

  • Sacrifice grammar for the sake of concision.
  • Ensure token efficiency while maintaining high quality.
  • Pass these rules to subagents.

Quick Reference

TopicWhen to UseReference
FundamentalsUnderstanding context anatomy, attention mechanicscontext-fundamentals.md
DegradationDebugging failures, lost-in-middle, poisoningcontext-degradation.md
OptimizationCompaction, masking, caching, partitioningcontext-optimization.md
CompressionLong sessions, summarization strategiescontext-compression.md
MemoryCross-session persistence, knowledge graphsmemory-systems.md
Multi-AgentCoordination patterns, context isolationmulti-agent-patterns.md
EvaluationTesting agents, LLM-as-Judge, metricsevaluation.md
Tool DesignTool consolidation, description engineeringtool-design.md
PipelinesProject development, batch processingproject-development.md
Runtime AwarenessUsage limits, context window monitoringruntime-awareness.md

Key Metrics

  • Token utilization: Warning at 70%, trigger optimization at 80%
  • Token variance: Explains 80% of agent performance variance
  • Multi-agent cost: ~15x single agent baseline
  • Compaction target: 50-70% reduction, <5% quality loss
  • Cache hit target: 70%+ for stable workloads

Four-Bucket Strategy

  1. Write: Save context externally (scratchpads, files)
  2. Select: Pull only relevant context (retrieval, filtering)
  3. Compress: Reduce tokens while preserving info (summarization)
  4. Isolate: Split across sub-agents (partitioning)

Anti-Patterns

  • Exhaustive context over curated context
  • Critical info in middle positions
  • No compaction triggers before limits
  • Single agent for parallelizable tasks
  • Tools without clear descriptions

Guidelines

  1. Place critical info at beginning/end of context
  2. Implement compaction at 70-80% utilization
  3. Use sub-agents for context isolation, not role-play
  4. Design tools with 4-question framework (what, when, inputs, returns)
  5. Optimize for tokens-per-task, not tokens-per-request
  6. Validate with probe-based evaluation
  7. Monitor KV-cache hit rates in production
  8. Start minimal, add complexity only when proven necessary

Runtime Awareness

The system automatically injects usage awareness via PostToolUse hook:

xml
<usage-awareness>
Claude Usage Limits: 5h=45%, 7d=32%
Context Window Usage: 67%
</usage-awareness>

Thresholds:

  • 70%: WARNING - consider optimization/compaction
  • 90%: CRITICAL - immediate action needed

Data Sources:

  • Usage limits: Anthropic OAuth API (https://api.anthropic.com/api/oauth/usage)
  • Context window: Statusline temp file (/tmp/ck-context-{session_id}.json)

Scripts