Signal Audit Skill

Purpose

Systematically measure the predictive information content of all available signals before building models. This prevents wasting effort on low-value features and identifies high-value signals you might be ignoring.

When to Use

•Before building any new predictive model
•When adding a new data source (cross-exchange feed, new API field)
•Quarterly review of signal value decay
•Debugging why a model stopped working
•Deciding which features to include in a model

Prerequisites

•measurement-infrastructure implemented (for outcome data)
•Historical market data with candidate signals
•Defined prediction targets

Core Concept: Mutual Information

Mutual information measures how many bits of information signal X provides about target Y:

code

I(X; Y) = H(Y) - H(Y|X)

Where H is entropy.

Key properties:

•I(X; Y) ≥ 0 (always non-negative)
•I(X; Y) = 0 if and only if X and Y are independent
•Works for non-linear relationships (unlike correlation)
•Units are bits (or nats if using natural log)

Signal Catalog

Book-Derived Signals

rust

struct BookSignals {
    // Basic
    spread_bps: f64,
    mid_price: f64,
    
    // Imbalance
    microprice_imbalance: f64,     // (bid_size - ask_size) / (bid_size + ask_size) at L1
    book_imbalance_l5: f64,        // Same but integrated over top 5 levels
    book_pressure: f64,            // Weighted depth asymmetry
    
    // Depth
    depth_at_1bps: f64,
    depth_at_5bps: f64,
    depth_at_10bps: f64,
    
    // Shape
    book_slope_bid: f64,           // How quickly depth increases away from mid
    book_slope_ask: f64,
}

Trade-Derived Signals

rust

struct TradeSignals {
    // Volume imbalance
    trade_imbalance_1s: f64,       // Net signed volume last 1s
    trade_imbalance_10s: f64,
    trade_imbalance_60s: f64,
    
    // Intensity
    trade_arrival_rate: f64,       // Trades per second
    volume_rate: f64,              // Volume per second
    
    // Size distribution
    avg_trade_size: f64,
    trade_size_std: f64,
    large_trade_count_1m: u32,     // Trades > 2σ from mean
    
    // Aggression
    aggressor_imbalance: f64,      // (aggressive_buys - aggressive_sells) / total
}

Hyperliquid-Specific Signals

rust

struct HyperliquidSignals {
    // Funding
    funding_rate: f64,
    funding_rate_change_1h: f64,
    funding_rate_change_8h: f64,
    predicted_funding_rate: f64,
    time_to_funding_settlement_s: f64,
    
    // Open Interest
    open_interest: f64,
    open_interest_change_1m: f64,
    open_interest_change_5m: f64,
    open_interest_change_1h: f64,
    oi_momentum: f64,              // Acceleration of OI change
    
    // Vault activity
    hlp_vault_position: f64,       // If available
}

Cross-Exchange Signals

rust

struct CrossExchangeSignals {
    // Binance
    binance_mid: f64,
    binance_spread_bps: f64,
    binance_hl_basis_bps: f64,     // Binance mid - HL mid
    
    // Lead indicators
    binance_return_100ms: f64,     // Binance price change last 100ms
    binance_return_500ms: f64,
    binance_return_1s: f64,
    
    // Volume ratio
    binance_volume_ratio: f64,     // Binance volume / HL volume
}

Composite Signals

rust

struct CompositeSignals {
    // Interactions
    funding_x_imbalance: f64,      // funding_rate * trade_imbalance
    oi_x_funding: f64,             // OI change * funding rate
    basis_x_imbalance: f64,        // Cross-exchange basis * book imbalance
    
    // Momentum
    price_momentum_1m: f64,
    price_momentum_5m: f64,
    volume_momentum: f64,
}

Prediction Targets

rust

enum PredictionTarget {
    // Direction
    PriceDirection1s,     // sign(price[t+1s] - price[t])
    PriceDirection10s,
    PriceDirection60s,
    
    // Magnitude
    AbsReturn1s,
    AbsReturn10s,
    Volatility1m,
    
    // Fill-related
    FillWithin1s,
    FillWithin10s,
    TimeToNextFill,
    
    // Adverse selection
    AdverseOnNextFill,    // Did price move against us?
    InformedFlow,         // Was the trade informed?
    
    // Regime
    RegimeTransition,     // Will regime change in next minute?
}

Mutual Information Estimation

k-NN Estimator (Kraskov et al.)

For continuous variables, use the k-nearest-neighbor estimator:

rust

use kdtree::KdTree;

fn estimate_mutual_information(
    x: &[f64],
    y: &[f64],
    k: usize,  // Typically 3-10
) -> f64 {
    let n = x.len();
    assert_eq!(n, y.len());
    
    // Normalize to [0, 1] to handle different scales
    let x_norm = normalize(x);
    let y_norm = normalize(y);
    
    // Build k-d trees
    let mut joint_tree = KdTree::new(2);
    let mut x_tree = KdTree::new(1);
    let mut y_tree = KdTree::new(1);
    
    for i in 0..n {
        joint_tree.add(&[x_norm[i], y_norm[i]], i).unwrap();
        x_tree.add(&[x_norm[i]], i).unwrap();
        y_tree.add(&[y_norm[i]], i).unwrap();
    }
    
    let mut mi_sum = 0.0;
    
    for i in 0..n {
        // Find k-th nearest neighbor distance in joint space (Chebyshev/max norm)
        let neighbors = joint_tree.nearest(&[x_norm[i], y_norm[i]], k + 1, &chebyshev_distance).unwrap();
        let eps = neighbors.last().unwrap().0;  // Distance to k-th neighbor
        
        // Count points within eps in marginals
        let n_x = count_within_chebyshev(&x_tree, x_norm[i], eps);
        let n_y = count_within_chebyshev(&y_tree, y_norm[i], eps);
        
        mi_sum += digamma(k as f64) + digamma(n as f64) 
                  - digamma(n_x as f64) - digamma(n_y as f64);
    }
    
    (mi_sum / n as f64).max(0.0)  // MI is non-negative
}

fn digamma(x: f64) -> f64 {
    if x < 6.0 {
        digamma(x + 1.0) - 1.0 / x
    } else {
        x.ln() - 1.0 / (2.0 * x) - 1.0 / (12.0 * x.powi(2))
    }
}

fn normalize(x: &[f64]) -> Vec<f64> {
    let min = x.iter().cloned().fold(f64::INFINITY, f64::min);
    let max = x.iter().cloned().fold(f64::NEG_INFINITY, f64::max);
    let range = (max - min).max(1e-10);
    x.iter().map(|&v| (v - min) / range).collect()
}

For Binary Targets

Use the simpler binned estimator:

rust

fn estimate_mi_binary_target(
    x: &[f64],
    y: &[bool],
    num_bins: usize,
) -> f64 {
    let n = x.len() as f64;
    
    // Bin the continuous variable
    let x_min = x.iter().cloned().fold(f64::INFINITY, f64::min);
    let x_max = x.iter().cloned().fold(f64::NEG_INFINITY, f64::max);
    let bin_width = (x_max - x_min) / num_bins as f64;
    
    // Count joint and marginal frequencies
    let mut joint_counts = vec![[0usize; 2]; num_bins];  // [bin][outcome]
    let mut x_counts = vec![0usize; num_bins];
    let mut y_counts = [0usize; 2];
    
    for (&xi, &yi) in x.iter().zip(y.iter()) {
        let bin = ((xi - x_min) / bin_width).floor() as usize;
        let bin = bin.min(num_bins - 1);
        let yi = if yi { 1 } else { 0 };
        
        joint_counts[bin][yi] += 1;
        x_counts[bin] += 1;
        y_counts[yi] += 1;
    }
    
    // Compute MI
    let mut mi = 0.0;
    for bin in 0..num_bins {
        for outcome in 0..2 {
            let p_xy = joint_counts[bin][outcome] as f64 / n;
            let p_x = x_counts[bin] as f64 / n;
            let p_y = y_counts[outcome] as f64 / n;
            
            if p_xy > 0.0 && p_x > 0.0 && p_y > 0.0 {
                mi += p_xy * (p_xy / (p_x * p_y)).ln();
            }
        }
    }
    
    mi.max(0.0)
}

Signal Analysis Framework

rust

struct SignalAnalysisResult {
    signal_name: String,
    target_name: String,
    
    // Information content
    mutual_information_bits: f64,
    mutual_information_normalized: f64,  // MI / H(Y), fraction of target entropy explained
    
    // Linear relationship (for comparison)
    correlation: f64,
    correlation_abs: f64,
    
    // Predictive power (if target is binary)
    auc_roc: Option<f64>,
    
    // Lag analysis
    optimal_lag_ms: i64,
    mi_at_optimal_lag: f64,
    
    // Regime dependence
    mi_by_regime: HashMap<String, f64>,
    regime_variance_ratio: f64,  // max(MI) / min(MI) across regimes
    
    // Stationarity
    mi_trend_30d: f64,  // Is MI increasing or decreasing over time?
}

fn analyze_signal(
    signal_name: &str,
    signal_values: &[f64],
    target_name: &str,
    target_values: &[f64],  // or &[bool] for binary
    timestamps: &[u64],
    regimes: &[String],
) -> SignalAnalysisResult {
    // Basic MI
    let mi = estimate_mutual_information(signal_values, target_values, 5);
    
    // Correlation
    let corr = pearson_correlation(signal_values, target_values);
    
    // Lag analysis
    let (optimal_lag, mi_at_lag) = find_optimal_lag(signal_values, target_values, timestamps);
    
    // MI by regime
    let mut mi_by_regime = HashMap::new();
    let unique_regimes: HashSet<_> = regimes.iter().collect();
    
    for regime in unique_regimes {
        let mask: Vec<bool> = regimes.iter().map(|r| r == regime).collect();
        let filtered_signal: Vec<f64> = signal_values.iter()
            .zip(&mask)
            .filter(|(_, &m)| m)
            .map(|(s, _)| *s)
            .collect();
        let filtered_target: Vec<f64> = target_values.iter()
            .zip(&mask)
            .filter(|(_, &m)| m)
            .map(|(t, _)| *t)
            .collect();
        
        if filtered_signal.len() >= 100 {
            let regime_mi = estimate_mutual_information(&filtered_signal, &filtered_target, 5);
            mi_by_regime.insert(regime.clone(), regime_mi);
        }
    }
    
    // Regime variance
    let mi_values: Vec<f64> = mi_by_regime.values().cloned().collect();
    let regime_variance_ratio = if mi_values.len() >= 2 {
        let max_mi = mi_values.iter().cloned().fold(0.0, f64::max);
        let min_mi = mi_values.iter().cloned().fold(f64::INFINITY, f64::min);
        max_mi / min_mi.max(0.001)
    } else {
        1.0
    };
    
    // Target entropy (for normalization)
    let target_entropy = compute_entropy(target_values);
    
    SignalAnalysisResult {
        signal_name: signal_name.to_string(),
        target_name: target_name.to_string(),
        mutual_information_bits: mi,
        mutual_information_normalized: mi / target_entropy.max(0.001),
        correlation: corr,
        correlation_abs: corr.abs(),
        auc_roc: None,  // Compute separately if needed
        optimal_lag_ms: optimal_lag,
        mi_at_optimal_lag: mi_at_lag,
        mi_by_regime,
        regime_variance_ratio,
        mi_trend_30d: 0.0,  // Compute from historical data
    }
}

fn find_optimal_lag(
    signal: &[f64],
    target: &[f64],
    timestamps: &[u64],
) -> (i64, f64) {
    let candidate_lags: Vec<i64> = vec![-500, -200, -100, -50, 0, 50, 100, 200, 500];
    
    let mut best_lag = 0i64;
    let mut best_mi = 0.0;
    
    for &lag_ms in &candidate_lags {
        let aligned = align_with_lag(signal, target, timestamps, lag_ms);
        if aligned.0.len() < 100 { continue; }
        
        let mi = estimate_mutual_information(&aligned.0, &aligned.1, 5);
        if mi > best_mi {
            best_mi = mi;
            best_lag = lag_ms;
        }
    }
    
    (best_lag, best_mi)
}

Signal Audit Report

rust

fn generate_signal_audit_report(
    signals: &HashMap<String, Vec<f64>>,
    target_name: &str,
    target: &[f64],
    timestamps: &[u64],
    regimes: &[String],
) -> String {
    let mut results: Vec<SignalAnalysisResult> = Vec::new();
    
    for (name, values) in signals {
        let result = analyze_signal(name, values, target_name, target, timestamps, regimes);
        results.push(result);
    }
    
    // Sort by MI descending
    results.sort_by(|a, b| b.mutual_information_bits.partial_cmp(&a.mutual_information_bits).unwrap());
    
    let mut report = format!("=== Signal Audit Report ===\nTarget: {}\n\n", target_name);
    
    report.push_str("Signal                      MI (bits)  Corr    Opt Lag   Regime Var\n");
    report.push_str("─────────────────────────────────────────────────────────────────────\n");
    
    for result in &results {
        report.push_str(&format!(
            "{:<26} {:.4}     {:.2}    {:>5}ms    {:.1}x\n",
            result.signal_name,
            result.mutual_information_bits,
            result.correlation,
            result.optimal_lag_ms,
            result.regime_variance_ratio,
        ));
    }
    
    // Actionable insights
    report.push_str("\nACTIONABLE INSIGHTS:\n");
    
    // Highest unused signal
    if let Some(top) = results.first() {
        report.push_str(&format!(
            "1. {} has highest MI ({:.4} bits) - prioritize if not already used\n",
            top.signal_name, top.mutual_information_bits
        ));
    }
    
    // Regime-conditional signals
    for result in &results {
        if result.regime_variance_ratio > 2.0 {
            report.push_str(&format!(
                "2. {} has {:.1}x higher MI in some regimes - consider regime conditioning\n",
                result.signal_name, result.regime_variance_ratio
            ));
            break;
        }
    }
    
    // Lagged signals
    for result in &results {
        if result.optimal_lag_ms != 0 && result.mi_at_optimal_lag > result.mutual_information_bits * 1.2 {
            report.push_str(&format!(
                "3. {} has 20%+ more MI at {}ms lag - incorporate lag in feature\n",
                result.signal_name, result.optimal_lag_ms
            ));
            break;
        }
    }
    
    // Correlated but low MI (non-linear relationship)
    for result in &results {
        if result.correlation_abs > 0.3 && result.mutual_information_bits < 0.01 {
            report.push_str(&format!(
                "4. {} has high correlation but low MI - relationship may be noisy or spurious\n",
                result.signal_name
            ));
            break;
        }
    }
    
    report
}

Example Report Output

code

=== Signal Audit Report ===
Target: PriceDirection1s

Signal                      MI (bits)  Corr    Opt Lag   Regime Var
─────────────────────────────────────────────────────────────────────
binance_return_100ms        0.0890     0.31    -150ms    2.3x
trade_imbalance_1s          0.0670     0.24       0ms    1.4x
microprice_imbalance        0.0450     0.19       0ms    1.2x
funding_x_imbalance         0.0410     0.15       0ms    3.1x
open_interest_change_1m     0.0230     0.08       0ms    1.1x
book_pressure               0.0180     0.11       0ms    1.3x
funding_rate                0.0120     0.05       0ms    1.8x

ACTIONABLE INSIGHTS:
1. binance_return_100ms has highest MI (0.089 bits) - prioritize if not already used
2. funding_x_imbalance has 3.1x higher MI in some regimes - consider regime conditioning
3. binance_return_100ms has 20%+ more MI at -150ms lag - incorporate lag in feature

Signal Quality Thresholds

rust

struct SignalQualityThresholds {
    // Minimum MI to include in model
    min_mi_bits: f64,           // 0.01 typical
    
    // Minimum samples for reliable estimate
    min_samples: usize,         // 1000 typical
    
    // Maximum regime variance before requiring conditioning
    max_regime_variance: f64,   // 3.0 typical
    
    // Minimum correlation for sanity check
    min_correlation: f64,       // 0.05 typical
}

fn filter_signals(
    results: &[SignalAnalysisResult],
    thresholds: &SignalQualityThresholds,
) -> Vec<&SignalAnalysisResult> {
    results.iter()
        .filter(|r| {
            r.mutual_information_bits >= thresholds.min_mi_bits
            && r.correlation_abs >= thresholds.min_correlation
        })
        .collect()
}

fn flag_regime_conditional(
    results: &[SignalAnalysisResult],
    thresholds: &SignalQualityThresholds,
) -> Vec<&SignalAnalysisResult> {
    results.iter()
        .filter(|r| r.regime_variance_ratio > thresholds.max_regime_variance)
        .collect()
}

Tracking Signal Decay

Signals lose value over time as:

•Other participants discover them
•Market structure changes
•Regime shifts

Track MI over rolling windows:

rust

fn compute_signal_decay(
    signal_name: &str,
    historical_mis: &[(NaiveDate, f64)],  // (date, MI) pairs
) -> SignalDecayReport {
    // Linear regression on MI over time
    let n = historical_mis.len() as f64;
    let x: Vec<f64> = (0..historical_mis.len()).map(|i| i as f64).collect();
    let y: Vec<f64> = historical_mis.iter().map(|(_, mi)| *mi).collect();
    
    let x_mean = x.iter().sum::<f64>() / n;
    let y_mean = y.iter().sum::<f64>() / n;
    
    let slope = x.iter().zip(&y)
        .map(|(xi, yi)| (xi - x_mean) * (yi - y_mean))
        .sum::<f64>()
        / x.iter().map(|xi| (xi - x_mean).powi(2)).sum::<f64>();
    
    // Half-life: how long until MI drops by 50%?
    let current_mi = y.last().unwrap();
    let half_life_days = if slope < 0.0 {
        (current_mi * 0.5) / (-slope)
    } else {
        f64::INFINITY  // MI is increasing or stable
    };
    
    SignalDecayReport {
        signal_name: signal_name.to_string(),
        current_mi: *current_mi,
        mi_30d_ago: historical_mis.get(historical_mis.len().saturating_sub(30))
            .map(|(_, mi)| *mi)
            .unwrap_or(*current_mi),
        trend_per_day: slope,
        half_life_days,
        action: if half_life_days < 30.0 {
            "URGENT: Signal decaying rapidly. Investigate or replace.".to_string()
        } else if half_life_days < 90.0 {
            "WARNING: Signal decaying. Monitor closely.".to_string()
        } else {
            "OK: Signal stable.".to_string()
        },
    }
}

Dependencies

•Requires: measurement-infrastructure (for outcome data), historical market data
•Enables: All model skills (by identifying which features to use)

Common Mistakes

•Using correlation instead of MI: Correlation misses non-linear relationships
•Not checking lag: Some signals lead the target and are more valuable at a lag
•Ignoring regime conditioning: A signal useless overall might be gold in specific regimes
•Not tracking decay: Signals that worked last year might be worthless now
•Too few samples: MI estimation needs 1000+ samples for reliability

Next Steps

After signal audit:

•Select top signals for your target (MI > 0.01 bits)
•Flag regime-conditional signals for special handling
•Incorporate optimal lags into feature engineering
•Read the relevant model skill to build the predictor
•Set up decay tracking for production monitoring