Vector Index Tuning
Guide to optimizing vector indexes for production performance.
Use this skill when
- •Tuning HNSW parameters
- •Implementing quantization
- •Optimizing memory usage
- •Reducing search latency
- •Balancing recall vs speed
- •Scaling to billions of vectors
Do not use this skill when
- •You only need exact search on small datasets (use a flat index)
- •You lack workload metrics or ground truth to validate recall
- •You need end-to-end retrieval system design beyond index tuning
Instructions
- •Gather workload targets (latency, recall, QPS), data size, and memory budget.
- •Choose an index type and establish a baseline with default parameters.
- •Benchmark parameter sweeps using real queries and track recall, latency, and memory.
- •Validate changes on a staging dataset before rolling out to production.
Refer to resources/implementation-playbook.md for detailed patterns, checklists, and templates.
Safety
- •Avoid reindexing in production without a rollback plan.
- •Validate changes under realistic load before applying globally.
- •Track recall regressions and revert if quality drops.
Resources
- •
resources/implementation-playbook.mdfor detailed patterns, checklists, and templates.