Measurement Infrastructure Skill
Purpose
Build the prediction logging and outcome tracking system that all other model improvements depend on. This is the foundation - without proper measurement, you can't know if anything else is working.
READ THIS SKILL FIRST before any model work.
When to Use
- •Setting up a new trading system from scratch
- •Adding prediction logging to an existing quote engine
- •Building the data pipeline for calibration analysis
- •Any time you're about to build a predictive model
Core Principle
Every quote cycle, your system makes implicit predictions:
- •"If I place a bid at this price, the probability of fill in 1s is X%"
- •"If I get filled, the probability of adverse selection is Y%"
- •"The price will move Z bps in the next 10 seconds"
These predictions must be recorded with enough granularity to diagnose failures.
Schema Definitions
1. Prediction Record (Top Level)
rust
struct PredictionRecord {
// Timing
timestamp_ns: u64,
quote_cycle_id: u64,
// Market state at prediction time (for conditioning analysis)
market_state: MarketStateSnapshot,
// Model outputs (what we predicted)
predictions: ModelPredictions,
// What actually happened (filled in async)
outcomes: Option<ObservedOutcomes>,
}
2. Market State Snapshot
Capture everything the model could condition on:
rust
struct MarketStateSnapshot {
// === L2 Book State ===
bid_levels: Vec<(f64, f64)>, // (price, size) for top N levels
ask_levels: Vec<(f64, f64)>,
spread_bps: f64,
// === Derived Quantities ===
microprice: f64,
microprice_uncertainty: f64,
book_imbalance: f64, // (bid_size - ask_size) / total
// === Kappa Inputs ===
kappa_book: f64,
kappa_robust: f64,
kappa_own: f64,
kappa_final: f64,
// === Volatility ===
sigma_bipower: f64,
sigma_realized_1m: f64,
sigma_realized_5m: f64,
// === Gamma Inputs ===
gamma_base: f64,
gamma_effective: f64,
// === Hyperliquid-Specific ===
funding_rate: f64,
funding_rate_predicted: f64,
time_to_funding_settlement_s: f64,
open_interest: f64,
open_interest_delta_1m: f64,
open_interest_delta_5m: f64,
// === Cross-Exchange ===
binance_mid: Option<f64>,
binance_spread_bps: Option<f64>,
binance_hl_basis_bps: Option<f64>,
// === Position State ===
inventory: f64,
inventory_age_s: f64,
unrealized_pnl: f64,
// === Regime ===
regime_quiet_prob: f64,
regime_trending_prob: f64,
regime_volatile_prob: f64,
regime_cascade_prob: f64,
}
3. Model Predictions
What the model actually predicted:
rust
struct ModelPredictions {
// Per-level predictions
levels: Vec<LevelPrediction>,
// Aggregate predictions
expected_fill_rate_1s: f64,
expected_fill_rate_10s: f64,
expected_adverse_selection_bps: f64,
// Direction predictions (if any)
predicted_price_direction_1s: f64, // [-1, 1]
predicted_price_direction_10s: f64,
direction_confidence: f64,
}
struct LevelPrediction {
side: Side,
price: f64,
size: f64,
depth_from_mid_bps: f64,
// Fill probability predictions at different horizons
p_fill_100ms: f64,
p_fill_1s: f64,
p_fill_10s: f64,
// Conditional predictions
p_adverse_given_fill: f64,
expected_adverse_bps_given_fill: f64,
expected_pnl_given_fill: f64,
// Queue position estimate
estimated_queue_position: f64,
estimated_queue_total: f64,
}
4. Observed Outcomes
Filled in asynchronously after the fact:
rust
struct ObservedOutcomes {
// Fill outcomes
fills: Vec<FillOutcome>,
// Price evolution (for direction/volatility validation)
price_100ms_later: f64,
price_1s_later: f64,
price_10s_later: f64,
price_60s_later: f64,
// Realized metrics
realized_volatility_1m: f64,
realized_adverse_selection_bps: f64,
}
struct FillOutcome {
level_index: usize, // Which LevelPrediction this corresponds to
fill_timestamp_ns: u64,
fill_latency_ns: u64, // Time from quote to fill
fill_price: f64,
fill_size: f64,
// Post-fill price evolution (for adverse selection measurement)
mark_price_at_fill: f64,
mark_price_100ms_later: f64,
mark_price_1s_later: f64,
mark_price_10s_later: f64,
}
Implementation Checklist
Step 1: Instrument Quote Generation
rust
impl QuoteEngine {
fn generate_quotes(&mut self, market_data: &MarketData) -> QuoteSet {
// Capture state BEFORE computing quotes
let market_state = self.snapshot_market_state(market_data);
// Generate quotes (existing logic)
let quotes = self.compute_optimal_quotes(market_data);
// Capture predictions
let predictions = self.extract_predictions("es, market_data);
// Log prediction record (outcomes filled later)
let record = PredictionRecord {
timestamp_ns: now_ns(),
quote_cycle_id: self.next_cycle_id(),
market_state,
predictions,
outcomes: None, // Filled async
};
self.prediction_logger.log(record);
quotes
}
}
Step 2: Build Async Outcome Matcher
rust
struct OutcomeMatcher {
// Pending predictions awaiting outcomes
pending: HashMap<u64, PredictionRecord>, // cycle_id -> record
// Configuration
max_horizon_s: f64, // How long to wait for outcomes (60s typical)
}
impl OutcomeMatcher {
fn on_fill(&mut self, fill: &Fill) {
// Find the prediction record this fill corresponds to
if let Some(record) = self.find_matching_record(fill) {
record.outcomes.get_or_insert_with(Default::default)
.fills.push(self.create_fill_outcome(fill));
}
}
fn on_price_update(&mut self, price: f64, timestamp_ns: u64) {
// Update price evolution for all pending records
for record in self.pending.values_mut() {
let elapsed_s = (timestamp_ns - record.timestamp_ns) as f64 / 1e9;
if let Some(outcomes) = &mut record.outcomes {
match elapsed_s {
t if t >= 0.1 && outcomes.price_100ms_later == 0.0 => {
outcomes.price_100ms_later = price;
}
t if t >= 1.0 && outcomes.price_1s_later == 0.0 => {
outcomes.price_1s_later = price;
}
// ... etc
_ => {}
}
}
}
}
fn flush_completed(&mut self) -> Vec<PredictionRecord> {
// Remove and return records that have all outcomes filled
let now = now_ns();
let mut completed = Vec::new();
self.pending.retain(|_, record| {
let age_s = (now - record.timestamp_ns) as f64 / 1e9;
if age_s > self.max_horizon_s {
if record.outcomes.is_some() {
completed.push(record.clone());
}
false // Remove from pending
} else {
true // Keep in pending
}
});
completed
}
}
Step 3: Storage Layer
Use Parquet for columnar storage (efficient for analytical queries):
rust
struct PredictionStorage {
writer: ParquetWriter,
buffer: Vec<PredictionRecord>,
flush_threshold: usize, // Flush every N records
}
impl PredictionStorage {
fn log(&mut self, record: PredictionRecord) {
self.buffer.push(record);
if self.buffer.len() >= self.flush_threshold {
self.flush();
}
}
fn flush(&mut self) {
// Convert to columnar format and write
let batch = self.to_record_batch(&self.buffer);
self.writer.write_batch(batch);
self.buffer.clear();
}
}
Step 4: Query Interface
rust
struct PredictionQuery {
storage_path: PathBuf,
}
impl PredictionQuery {
/// Load predictions for a date range
fn load_range(&self, start: DateTime, end: DateTime) -> Vec<PredictionRecord> {
// Parquet predicate pushdown for efficiency
read_parquet_with_filter(
&self.storage_path,
|row| row.timestamp >= start && row.timestamp <= end
)
}
/// Load predictions for a specific regime
fn load_by_regime(&self, regime: Regime, date: Date) -> Vec<PredictionRecord> {
self.load_range(date.start(), date.end())
.into_iter()
.filter(|r| r.market_state.dominant_regime() == regime)
.collect()
}
/// Load predictions with fills (for adverse selection analysis)
fn load_filled(&self, date: Date) -> Vec<PredictionRecord> {
self.load_range(date.start(), date.end())
.into_iter()
.filter(|r| r.outcomes.as_ref().map(|o| !o.fills.is_empty()).unwrap_or(false))
.collect()
}
}
What to Log vs What to Skip
Must Log (Critical)
- •All fill probability predictions
- •All fill outcomes
- •Post-fill price evolution (for adverse selection)
- •Market state at prediction time
- •Regime probabilities
Should Log (Important)
- •Kappa inputs and outputs
- •Gamma inputs and outputs
- •Queue position estimates
- •Cross-exchange state
Can Skip (Space Optimization)
- •Full L2 book beyond top 5 levels
- •Sub-100ms price updates
- •Predictions for orders that were never placed
Validation Checks
Before considering the infrastructure complete:
rust
#[test]
fn prediction_outcome_match_rate() {
let records = load_last_24h();
// Every prediction should eventually get outcomes
let with_outcomes = records.iter()
.filter(|r| r.outcomes.is_some())
.count();
let match_rate = with_outcomes as f64 / records.len() as f64;
assert!(match_rate > 0.99, "Outcome match rate too low: {}", match_rate);
}
#[test]
fn fill_outcome_completeness() {
let records = load_last_24h();
for record in &records {
if let Some(outcomes) = &record.outcomes {
for fill in &outcomes.fills {
// Every fill should have post-fill prices
assert!(fill.mark_price_1s_later > 0.0);
assert!(fill.mark_price_10s_later > 0.0);
}
}
}
}
#[test]
fn market_state_completeness() {
let records = load_last_24h();
for record in &records {
let state = &record.market_state;
// Critical fields must be populated
assert!(state.microprice > 0.0);
assert!(state.kappa_final > 0.0);
assert!(state.sigma_bipower > 0.0);
// Regime probabilities must sum to 1
let regime_sum = state.regime_quiet_prob
+ state.regime_trending_prob
+ state.regime_volatile_prob
+ state.regime_cascade_prob;
assert!((regime_sum - 1.0).abs() < 0.01);
}
}
Dependencies
- •Requires: Your existing quote engine, market data feed
- •Enables: calibration-analysis, signal-audit, all model skills
Common Mistakes
- •Logging only aggregates: You need per-prediction granularity to diagnose issues
- •Missing market state: Without conditioning variables, you can't do conditional calibration
- •Synchronous outcome filling: This blocks the hot path; must be async
- •Not logging "boring" periods: Quiet market data is just as important for calibration
- •Forgetting to log predictions for orders that didn't fill: These are negative examples
Next Steps
Once this infrastructure is in place:
- •Read
calibration-analysis/SKILL.mdto analyze the logged data - •Read
signal-audit/SKILL.mdto measure signal information content - •Then proceed to specific model skills