Token Usage Estimation and Optimization

Deliver practical token estimates and a prioritized optimization plan without reducing output quality unnecessarily.

Quick Start

•Clarify the objective, target model(s), and constraints (budget, latency, max input/output tokens).
•Inventory prompt components (system, developer, tool list, history, retrieved docs, user input).
•Estimate tokens for each component and find the top 2-3 drivers.
•Apply the smallest, highest-impact optimizations first.
•Re-estimate and validate output quality.

Use a tokenizer or estimation library when available. If not, use fast heuristics and add a buffer.

Template:

text

input_tokens ≈ (system + tools + history + retrieval + user)
output_tokens ≈ target_response_length
total_tokens ≈ input_tokens + output_tokens

•Trim tool list: only include relevant tools for the request.
•Conditional tool instructions: include tool-specific guidance only when tool is present.
•Prompt caching: place stable instructions at the top for caching benefits.
•Summarize history: keep last N turns + compact summary + key decisions.
•Reduce retrieval size: tighten query, limit top-k, dedupe, and remove boilerplate.
•Shorten system prompt: remove redundant policies or examples.

•Use smaller models for classification, routing, summarization, and labeling.
•Escalate to larger models only for complex reasoning or high-stakes outputs.

•If prompt exceeds budget by <15% → trim tool list and boilerplate first.
•If exceeded by 15-40% → summarize older history and remove low-value examples.
•If exceeded by >40% → switch to retrieval + short summary + last N turns.
•If still too large → move large static guidance to cached prefix or references.

•Keep tool names and descriptions clear and specific.
•Filter tools before each call to avoid paying for unused tool metadata.
•Avoid stuffing large static instructions into every turn; move to cached prefix.

Requirements:

bash

node + npx available on PATH

Estimate tokens from a file or raw text:

bash

node scripts/token_estimate.mjs --file AGENTS.md

bash

node scripts/token_estimate.mjs --text "Hello"

Tune conservatism with a custom chars-per-token value:

bash

node scripts/token_estimate.mjs --file AGENTS.md --chars-per-token 4

•Read references/token-optimization-sources.md only when asked for citations or rationale.