Bug Hunting Workflow
Intent
This skill is for finding real bugs in the implementation, not guessing. It forces a discovery phase (build a mental model, reproduce, narrow scope) before making changes.
You must prioritize:
- •correctness and durability
- •clear reproduction
- •minimal fix + regression test
Non-negotiable rules
- •No fix without a reproduction (a failing test, a minimal repro program, or a deterministic HTTP/CLI script).
- •No durability regressions: do not weaken WAL/manifest/atomic rename/fsync ordering.
- •No “drive-by refactors” while bug hunting; keep patches tight and reviewable.
- •After changes, you must run the full quality suite from
code-quality.
Discovery phase (required)
0) Establish the boundaries
Before opening files, answer (in notes) these questions:
- •Which surface is affected? CLI, HTTP, embedded API, wasm, or ffi?
- •Is it a correctness issue (wrong results), durability issue (lost/duplicated data), or stability issue (panic/hang)?
- •Does it require a specific feature flag (
vectors,zstd,gpu, wasm/IndexedDB, ffi)? - •Is the failure deterministic or intermittent?
1) Baseline the repo (don’t trust your local state)
Run in workspace root:
- •
cargo fmt --all - •
cargo clippy --all --all-features --all-targets -- -D warnings - •
cargo test --all --all-features
If anything fails here, you may be chasing a known failure; fix/resolve that first.
2) Map the architecture you’re about to debug
Create a quick map in notes (1–2 minutes):
- •
searchlite-core: schema, indexing, WAL/manifest, segments, query execution, filters/aggs, vectors - •
searchlite-cli: user-facing commands and JSON output formatting - •
searchlite-http: endpoints (/init, /add, /bulk, /delete, /commit, /refresh, /search, /inspect, /stats, /compact), limits/timeouts/concurrency - •
searchlite-wasm: bindings + IndexedDB storage semantics (no fsync) - •
searchlite-ffi: C ABI / safety edges
This map guides where you look first.
3) Reproduce the bug with the smallest possible corpus
Produce a “minimal reproducible case” artifact set:
- •a schema JSON (small)
- •a small NDJSON/doc list (5–50 docs)
- •one request (HTTP JSON or CLI command) that fails
For HTTP:
- •include the server flags/env (bind addr, index path, max-body-bytes, max-concurrency, request-timeout, refresh-on-commit)
For CLI:
- •include exact commands and any flags
Goal: a script a teammate can run in <60 seconds.
4) Turn on observability (don’t add prints yet)
Use existing debugging features first:
- •For search correctness: add
explain: trueto the request. - •For performance/odd slowdowns while reproducing: add
profile: true. - •For server behavior: run with
RUST_LOG=debug(or more targeted module filters) and capture the relevant lines.
If your issue is “writes not visible,” verify lifecycle:
- •you must
commit - •you may need
/refreshin HTTP depending on config
5) Narrow the fault line by bisecting the pipeline
Pick one axis at a time:
Correctness axis
- •Does the doc get ingested successfully?
- •Does it appear in
/statsdocuments? - •Does
/inspectshow a new segment after commit? - •Does the query match in bm25 mode vs wand/bmw mode?
Durability axis
- •Is WAL appended correctly?
- •Is manifest updated atomically?
- •Is WAL truncated only after manifest persistence?
- •After restart, does replay do the expected thing?
HTTP axis
- •Is the request being rejected due to max body / timeouts / concurrency?
- •Is NDJSON streaming partial and rolling back?
- •Are errors correctly surfaced as
{"error":{"type","reason"}}?
6) Perform a targeted static sweep of the suspected modules
Once you know the rough area, do a deliberate scan for bug smells:
Search patterns to grep/ripgrep for in the relevant crate:
- •panics:
unwrap(),expect(,panic!,unreachable!,todo! - •suspicious error handling:
map_err(|_|,ok()?in non-optional contexts, silentlet _ = - •integer edge cases: casts (
as usize), underflow (- 1), unchecked indexing - •concurrency hazards:
Mutex/RwLocklock ordering,.awaitwhile holding locks, channels without bounds - •IO hazards: missing
fsync, non-atomic file replace, partial writes not handled, directory fsync omissions - •schema mismatch hazards: stringly-typed field lookups, analyzer lookup defaults, missing nested path validation
- •vector hazards: dimension mismatch, normalization assumptions, unchecked candidate sizing,
NaNpropagation
You are not “fixing” in this step—just identifying suspects.
7) Dynamic bug probes by subsystem (choose what fits)
A) WAL / manifest / commit / rollback
If you suspect commit/durability issues:
- •Reproduce a crash-window scenario:
- •ingest docs
- •start commit
- •force termination mid-way (or simulate via test hooks if present)
- •restart and observe replay
- •Validate the invariants:
- •manifest atomic update (rename) and directory sync
- •WAL truncation happens after manifest sync
- •rollback cleans new segment artifacts
If you can’t safely crash in tests, create a unit test around writer/commit phases and validate state.
B) Segment lifecycle / compaction / cleanup
If you see duplicates, ghosts, or bloat:
- •Inspect segments via
inspectand compare tostats - •Run compaction and verify:
- •live docs preserved
- •deleted/tombstoned dropped
- •old segment files removed (or expected to remain until cleanup)
C) Query execution modes
If you suspect scoring/pruning bugs:
- •Run the same query with
execution=bm25,execution=wand,execution=bmw - •Results should match for deterministic corpora (unless the contract explicitly says otherwise)
- •If they diverge, you likely have a pruning-bound bug or block-max metadata mismatch
D) Filters/aggs/nested correctness
If filters/aggs are wrong:
- •Verify required fields are
fast: true - •For nested, verify the filter uses a
Nestedblock and that clauses bind to the same object instance - •Add a repro doc with two nested objects that would catch “cross-object” leakage
E) Vectors (feature vectors)
If vector behavior is wrong:
- •Check dimension mismatch handling
- •Check cosine normalization expectations
- •Confirm ANN parameters (
candidate_size,ef_search) don’t overflow or accept nonsensical values - •Verify hybrid search doesn’t double-count or drop vector scores
Fix phase (required)
1) Write the failing regression test first
- •Prefer a unit test in the most local crate if possible.
- •If the bug is surface-level behavior (HTTP contract, CLI output), add an integration test.
- •Make it deterministic and small.
- •Assert on structure, not strings (parse JSON output).
2) Implement the minimal fix
- •Tight patch, minimal diff.
- •Keep durability semantics intact.
- •Avoid introducing allocations/locks in hot paths unless justified.
3) Prove it
Run:
- •
cargo test --all --all-features - •Any relevant examples (e.g. pruning example if you touched pruning logic)
- •If the bug touched performance-sensitive code, run the relevant bench before/after.
4) Add a “root cause” note
In the PR description or internal notes, include:
- •symptom
- •root cause
- •why the fix is correct
- •what test prevents regressions
Output format (what Codex should produce)
When using this skill, produce:
- •Repro artifact (schema + docs + request/command)
- •Fault localization (files + functions involved)
- •Root cause analysis
- •Fix plan (minimal changes)
- •Regression test (where it lives and what it asserts)
- •Validation commands run (from code-quality)