Rust Elite Standards (Edition 2024)
Architecture
- •Edition: Must use Rust Edition 2024.
- •Dependencies: Use Standard Polars
0.52.x(ensure patch version compatibility).
Safety & Error Handling
- •No Panic Policy: Strictly forbid
unwrap()andexpect()in business logic. - •Typed Errors: Use
thiserrorfor library errors andanyhowfor application-level errors. - •Async: Use
tokiofor async runtime unless specified otherwise.
Testing & Quality (Atomic Simulator VAS)
- •TDD Requirement: "No Test, No Commit". Every feature must have a "Red Phase" failing test before logic implementation.
- •Performance Benchmarks: Core operations (e.g., Ledger queries) must meet sub-100ms targets for 1000 rows.
- •Verification: Every Mission must conclude with a
walkthrough.mddocumenting test results.
Interoperability (Tauri/TS)
- •Data Contract: All event payloads between Rust and TypeScript MUST use
camelCase. Use#[serde(rename_all = "camelCase")]on structs. - •Glass Panel Philosophy: UI is strictly a "Glass Panel". Zero business logic, zero state inferencing, and zero data modification allowed in the Frontend. UI ONLY renders what the Core (Rust) dictates.
- •IPC Safety: Use
#[tauri::arg(rename = "camelCase")]or#[allow(non_snake_case)]to reconcile Rust'ssnake_casewith Frontend'scamelCasewithout breaking convention.
Observability
- •Traceability: Every result-bearing function must be traced or logged using the
tracingcrate. Use#[tracing::instrument]on core logic.
Extraction (Unified Orchestrator)
- •Architecture: Use the "Unified Extraction Orchestrator" pattern. Orchestrators must be stateless, purely gaging capability and dispatching to lanes.
- •Stability & Isolation: Heavy FFI tasks (PDF/Office) MUST be isolated in sub-processes via
WorkerManager. - •Fate-sharing: Workers must monitor
stdin. If parent drops, worker must exit immediately to avoid "ghost processes". - •Resource Governor: Limit maximum concurrent workers using a
Semaphore. Cap based on CPU core count or memory availability. - •Contract Enforcement: All lanes must speak the
ExtractionProductJSON protocol.
Interoperability (IPC Protocol)
- •JSON Stream: Communication between Main (Rust) and Worker (Python) must be conducted via JSON over
stdin/stdout. - •CamelCase Alignment: All IPC payloads MUST be
camelCase. - •Timeout Policy: Every worker task must have a hard timeout to prevent blocking the dispatcher.
- •Zero-Lag IPC: Round-trip time (RTT) for IPC MUST NOT exceed 20ms. Use persistent worker pools to avoid cold-start overhead. Any refactoring that degrades IPC performance beyond this threshold MUST be rejected.
Polars Integration (Mission 018+)
- •Version Lock: Use Polars
0.52.xfor stability and API consistency. - •DataFrame Construction:
- •Use
DataFrame::new(Vec<Column>)(notVec<Series>) - •Convert
Series→Columnvia.into_column()orColumn::from(series)
- •Use
- •Type Safety:
- •Column names must use
.into()forPlSmallStrcompatibility - •Example:
Series::new((&col_name).into(), &values) - •Avoid
.unwrap()on DataFrame operations — useResultpropagation
- •Column names must use
- •Testing:
- •Every Polars operation must have a unit test with known input/output
- •Benchmark DataFrame creation < 50ms per table (per
LATENCY_BUDGET.md)
- •Documentation:
- •Always check Polars docs for current version before implementation
- •Polars API changes frequently between minor versions
Lessons Learned (Mission 018)
❌ What Went Wrong
- •Ignored Skill Standards: Used Polars
0.45instead of0.52.xspecified in skill - •No TDD: Wrote implementation before tests, leading to trial-and-error debugging
- •API Assumptions: Guessed Polars API instead of reading docs, wasted time on type mismatches
- •Missing Observability: No
#[tracing::instrument]on core functions
✅ What Was Fixed
- •Version Alignment: Upgraded to Polars
0.52— clean build with no breaking changes - •Type Corrections:
- •
Vec<Series>→Vec<Column>for DataFrame construction - •Added
.into()forPlSmallStrcolumn names
- •
- •Dependency Management: Added missing
chronofor timestamp handling
📋 Process Improvements
- •Read Skill First: Always check skill requirements before choosing dependencies
- •TDD Discipline: Write failing test → implement → verify (Red-Green-Refactor)
- •Version Lock Early: Pin exact versions in
Cargo.tomlto avoid API drift - •Document Assumptions: If deviating from skill, document why in commit message
Refactor & Safety Audit (Elite Mandatory)
- •
Unsafe Policy
- •
unsafeblocks are forbidden by default. - •If unavoidable:
- •Must be isolated in a single module.
- •Must include SAFETY comments explaining invariants.
- •Must be reviewed and traced.
- •
- •
Refactor Discipline
- •Eliminate all
unwrap()/expect()(including tests & examples). - •Replace with
Result<T, E>and?. - •No silent fallback.
- •Eliminate all
- •
Public API Contract
- •Any public function that can fail MUST return
Result. - •Error types must be explicit at library boundaries.
- •Any public function that can fail MUST return
- •
Refactor Workflow
- •Analyze module responsibilities.
- •Identify unsafe / panic / implicit failure.
- •Propose refactor plan before applying major logic changes.
- •Refactor module-by-module under
src/. - •Update call sites, tests, examples, and documentation.
Data Purity Protocol (The Janitor's Decree)
- •Architectural Separation: Data cleaning (Janitor) is STRICTLY separated from data validation (TableTruth).
- •Stateless & Pure: The Janitor layer must be a pure transformer. It does not hold state or infer business logic.
- •Reporting Hierarchy:
JanitorReportis non-authoritative. It is an audit trail for the Dashboard, but MUST NEVER be used by theTruthlayer to determine validity. - •I/O Boundary Enforcement (Encoding): All external text (e.g., from PythonWorkers) must pass through an
EncodingGatekeeperfor UTF-8 and Mojibake validation before reaching the Janitor. - •Ghost Rules:
- •Ejection of "Ghost Columns" is preferred for structural hallucinations.
- •Row ejection is only allowed for 100% empty rows with no semantic significance (e.g., header rows, spacer rows).
- •SIMD Usage: Prefer standard library optimizations (Regex, Arrow) for SIMD. Avoid manual intrinsics unless explicitly authorized for a specific Mission.
- •Syntactic Cleaning Only: Janitor cleans characters and formats (1.250,50 -> 1250.5). It NEVER performs semantic conversion (m3 -> liters). Parse failures are left "as-is" to be rejected by Truth.
Iron Truth Contract & Clean Hands Doctrine (Mission 024-027)
- •LAW-07 (Fail Safe & Human-Gated): Systems MUST detect and reject anomalies (Mojibake, structural rot) but MUST NEVER attempt autonomous repair. All repairs are human-gated.
- •Clean Hands Doctrine: The Truth Engine (
TableTruth) must remain pure. It only validates data. Any repair logic resides in theAdapter/Enginelayer and results must be re-submitted for validation. "The Judge does not write the Law, and the Janitor does not argue with the Truth." - •Iron Truth Contract V1.0:
- •
TableTruthis the singular source of truth for structural validity. - •
ProjectTruthis the derived authority for cross-source reconciliation. - •Any conflict between Reality and Truth results in
Rejectedstatus by default.
- •
- •Global Singletons: Use
std::sync::OnceLockfor global, thread-safe singletons (e.g.,EncodingNormalizer). Avoidlazy_staticunless complex macro execution is required. - •UI Architecture:
- •4-Panel Arbiter Layout: Mandatory for data-dense forensic tools.
- •Virtualization: Mandatory for tables > 100 rows to maintain sub-1s interactivity. Row height fixed at 32px for precision.
- •Forbidden Patterns: Non-deterministic loaders (spinners), softening of rejection language ("Check again" -> "TỪ CHỐI"), and autonomous "Fix" buttons are strictly forbidden.
Project Intelligence & Lineage Protocol (Mission 030)
- •Truth Lineage: Every aggregated metric (e.g.,
total_cost) MUST carry aLineageMap. - •Shadow Columns: Use a
_lineage_prefix for forensic metadata columns in Polars. These columns are for sideboat metadata only and MUST NOT participate in primary business logic (sort/filter/calc). - •Deterministic GlobalId:
GlobalIdfor any entity (cell, row, table) must be deterministic and hash-based:hash(v1_id, v2_id...). Never use random UUIDs for forensic data. - •Backward Compatibility: When adding new fields to core data contracts (e.g.,
TableCell), use#[serde(default)]to ensure compatibility with existing serialized mocks and project files. - •Polars String Handling (0.52): To check if a column name exists in
df.get_column_names(), convert to aVec<String>first or handlePlSmallStrtypes carefully to avoid&strvs&&strtype mismatches in.contains(). - •Forensic State Persistence: Tauri backend
ForensicStatemust act as the primary cache for derived truths (likeProjectTruth) to ensure consistency across independent IPC commands (Drill-down, Export).