AgentSkillsCN

bug-hunting

通过检查真实代码路径(CLI/HTTP/core),验证持久性不变量,并生成回归测试,构建一套用于在Searchlite中发现、复现并修复Bug的结构化工作流;适用于调查错误结果、程序崩溃、数据丢失风险,或“有时会出错”的问题报告。

SKILL.md
--- frontmatter
name: bug-hunting
description: Structured workflow for discovering, reproducing, and fixing bugs in Searchlite by inspecting real code paths (CLI/HTTP/core), validating durability invariants, and producing regression tests; use when investigating incorrect results, crashes, data loss risks, or “it sometimes fails” reports.

Bug Hunting Workflow

Intent

This skill is for finding real bugs in the implementation, not guessing. It forces a discovery phase (build a mental model, reproduce, narrow scope) before making changes.

You must prioritize:

  1. correctness and durability
  2. clear reproduction
  3. minimal fix + regression test

Non-negotiable rules

  • No fix without a reproduction (a failing test, a minimal repro program, or a deterministic HTTP/CLI script).
  • No durability regressions: do not weaken WAL/manifest/atomic rename/fsync ordering.
  • No “drive-by refactors” while bug hunting; keep patches tight and reviewable.
  • After changes, you must run the full quality suite from code-quality.

Discovery phase (required)

0) Establish the boundaries

Before opening files, answer (in notes) these questions:

  • Which surface is affected? CLI, HTTP, embedded API, wasm, or ffi?
  • Is it a correctness issue (wrong results), durability issue (lost/duplicated data), or stability issue (panic/hang)?
  • Does it require a specific feature flag (vectors, zstd, gpu, wasm/IndexedDB, ffi)?
  • Is the failure deterministic or intermittent?

1) Baseline the repo (don’t trust your local state)

Run in workspace root:

  • cargo fmt --all
  • cargo clippy --all --all-features --all-targets -- -D warnings
  • cargo test --all --all-features

If anything fails here, you may be chasing a known failure; fix/resolve that first.

2) Map the architecture you’re about to debug

Create a quick map in notes (1–2 minutes):

  • searchlite-core: schema, indexing, WAL/manifest, segments, query execution, filters/aggs, vectors
  • searchlite-cli: user-facing commands and JSON output formatting
  • searchlite-http: endpoints (/init, /add, /bulk, /delete, /commit, /refresh, /search, /inspect, /stats, /compact), limits/timeouts/concurrency
  • searchlite-wasm: bindings + IndexedDB storage semantics (no fsync)
  • searchlite-ffi: C ABI / safety edges

This map guides where you look first.

3) Reproduce the bug with the smallest possible corpus

Produce a “minimal reproducible case” artifact set:

  • a schema JSON (small)
  • a small NDJSON/doc list (5–50 docs)
  • one request (HTTP JSON or CLI command) that fails

For HTTP:

  • include the server flags/env (bind addr, index path, max-body-bytes, max-concurrency, request-timeout, refresh-on-commit)

For CLI:

  • include exact commands and any flags

Goal: a script a teammate can run in <60 seconds.

4) Turn on observability (don’t add prints yet)

Use existing debugging features first:

  • For search correctness: add explain: true to the request.
  • For performance/odd slowdowns while reproducing: add profile: true.
  • For server behavior: run with RUST_LOG=debug (or more targeted module filters) and capture the relevant lines.

If your issue is “writes not visible,” verify lifecycle:

  • you must commit
  • you may need /refresh in HTTP depending on config

5) Narrow the fault line by bisecting the pipeline

Pick one axis at a time:

Correctness axis

  • Does the doc get ingested successfully?
  • Does it appear in /stats documents?
  • Does /inspect show a new segment after commit?
  • Does the query match in bm25 mode vs wand/bmw mode?

Durability axis

  • Is WAL appended correctly?
  • Is manifest updated atomically?
  • Is WAL truncated only after manifest persistence?
  • After restart, does replay do the expected thing?

HTTP axis

  • Is the request being rejected due to max body / timeouts / concurrency?
  • Is NDJSON streaming partial and rolling back?
  • Are errors correctly surfaced as {"error":{"type","reason"}}?

6) Perform a targeted static sweep of the suspected modules

Once you know the rough area, do a deliberate scan for bug smells:

Search patterns to grep/ripgrep for in the relevant crate:

  • panics: unwrap(), expect(, panic!, unreachable!, todo!
  • suspicious error handling: map_err(|_|, ok()? in non-optional contexts, silent let _ =
  • integer edge cases: casts (as usize), underflow (- 1), unchecked indexing
  • concurrency hazards: Mutex/RwLock lock ordering, .await while holding locks, channels without bounds
  • IO hazards: missing fsync, non-atomic file replace, partial writes not handled, directory fsync omissions
  • schema mismatch hazards: stringly-typed field lookups, analyzer lookup defaults, missing nested path validation
  • vector hazards: dimension mismatch, normalization assumptions, unchecked candidate sizing, NaN propagation

You are not “fixing” in this step—just identifying suspects.

7) Dynamic bug probes by subsystem (choose what fits)

A) WAL / manifest / commit / rollback

If you suspect commit/durability issues:

  • Reproduce a crash-window scenario:
    1. ingest docs
    2. start commit
    3. force termination mid-way (or simulate via test hooks if present)
    4. restart and observe replay
  • Validate the invariants:
    • manifest atomic update (rename) and directory sync
    • WAL truncation happens after manifest sync
    • rollback cleans new segment artifacts

If you can’t safely crash in tests, create a unit test around writer/commit phases and validate state.

B) Segment lifecycle / compaction / cleanup

If you see duplicates, ghosts, or bloat:

  • Inspect segments via inspect and compare to stats
  • Run compaction and verify:
    • live docs preserved
    • deleted/tombstoned dropped
    • old segment files removed (or expected to remain until cleanup)

C) Query execution modes

If you suspect scoring/pruning bugs:

  • Run the same query with execution=bm25, execution=wand, execution=bmw
  • Results should match for deterministic corpora (unless the contract explicitly says otherwise)
  • If they diverge, you likely have a pruning-bound bug or block-max metadata mismatch

D) Filters/aggs/nested correctness

If filters/aggs are wrong:

  • Verify required fields are fast: true
  • For nested, verify the filter uses a Nested block and that clauses bind to the same object instance
  • Add a repro doc with two nested objects that would catch “cross-object” leakage

E) Vectors (feature vectors)

If vector behavior is wrong:

  • Check dimension mismatch handling
  • Check cosine normalization expectations
  • Confirm ANN parameters (candidate_size, ef_search) don’t overflow or accept nonsensical values
  • Verify hybrid search doesn’t double-count or drop vector scores

Fix phase (required)

1) Write the failing regression test first

  • Prefer a unit test in the most local crate if possible.
  • If the bug is surface-level behavior (HTTP contract, CLI output), add an integration test.
  • Make it deterministic and small.
  • Assert on structure, not strings (parse JSON output).

2) Implement the minimal fix

  • Tight patch, minimal diff.
  • Keep durability semantics intact.
  • Avoid introducing allocations/locks in hot paths unless justified.

3) Prove it

Run:

  • cargo test --all --all-features
  • Any relevant examples (e.g. pruning example if you touched pruning logic)
  • If the bug touched performance-sensitive code, run the relevant bench before/after.

4) Add a “root cause” note

In the PR description or internal notes, include:

  • symptom
  • root cause
  • why the fix is correct
  • what test prevents regressions

Output format (what Codex should produce)

When using this skill, produce:

  1. Repro artifact (schema + docs + request/command)
  2. Fault localization (files + functions involved)
  3. Root cause analysis
  4. Fix plan (minimal changes)
  5. Regression test (where it lives and what it asserts)
  6. Validation commands run (from code-quality)