Unit Testing Patterns

Principles

Unit tests should be:

•Hermetic: no network, no global state, no shared directories.
•Deterministic: fixed ids, fixed inputs, stable assertions.
•Small: quick to run; integration tests cover full flows.
•Behavior-focused: assert externally visible behavior of the module/API you’re testing.

•
Use tempfile::tempdir() for filesystem-backed tests.
•
Prefer a minimal schema:
- •_id as doc_id_field
- •one text field (indexed + stored if needed)
- •one keyword fast field (filters/aggs)
- •one numeric fast field (ranges/aggs)
- •add nested/vector only when needed
•
Prefer NDJSON-style docs for realism, but for unit tests:
- •construct docs via serde_json where possible
- •keep docs minimal and explicit

•Write + commit; reopen index; ensure data present.
•
Simulate crash-window behavior if the API allows:
- •manifest persisted but WAL not truncated → replay creates extra generation
- •compaction cleans duplicates / reduces segments

These tests are crucial if you touch commit ordering, fsync behavior, or WAL encoding.

Include at least one test each for:

•Use a tiny corpus where expected top-k is obvious.
•
If testing WAND/BMW pruning:
- •assert results match the full-evaluation baseline (bm25) for deterministic corpora.

Filters:

Aggregations:

•Construct a doc with multiple nested objects.
•Assert that nested constraints bind to the same object instance (not “cross-object” matches).
•Add deeper nesting tests if supported (nested inside nested).

If you expose explain or profile at the core layer:

•
Prefer structured assertions:
- •parse JSON responses into structs or serde_json::Value
- •compare sorted lists for unordered fields
•
When checking floats (scores):
- •assert relative ordering, not exact values, unless values are contractually stable
•Avoid “string contains” unless you’re testing human-readable CLI output intentionally.