ML Batch Processing Pattern

Classification

•Domain: Computer Science, AI/ML
•Category: ML System Design Patterns
•Novelty: 6/10 (established pattern with modern evolution)
•Practitioner Evidence: 10/10 (Google, industry standard)

Mental Model

Batch processing decouples prediction from real-time requests by pre-computing predictions on scheduled intervals. Like meal prep for the week instead of cooking each meal on-demand—you process predictions in bulk during off-peak hours, store results, and serve them instantly when requested.

When to Use

•Predictions needed for all users/items at regular intervals (daily recommendations, weekly reports)
•Training data arrives in batches rather than continuously
•Cost optimization prioritized over real-time freshness (batch = cheaper compute)
•Predictions can tolerate staleness (hours/days old acceptable)
•High-throughput scenarios where latency isn't critical

Core Framework

1. Schedule Determination

Identify prediction cadence based on business requirements

•Daily batch: Nightly recommendation refresh for morning users
•Hourly batch: Stock predictions updating each trading hour
•Weekly batch: Monthly subscription churn predictions
•Event-triggered: Batch after data warehouse ETL completion

2. Data Ingestion Setup

Configure batch data pipeline from sources to ML system

•Extract from data warehouse/data lake (BigQuery, Snowflake, S3)
•Apply feature transformations matching training pipeline
•Validate schema consistency with model expectations
•Handle missing values using same imputation as training

3. Distributed Processing Architecture

Parallelize prediction computation across infrastructure

•Use MapReduce/Spark for horizontal scaling across datasets
•Partition data by entity (user_id, product_id) for independent processing
•Configure batch size based on memory constraints (1K-100K records/batch)
•Implement checkpointing for fault tolerance on long-running jobs

4. Model Serving Configuration

Deploy model in batch-optimized inference mode

•Load model once per batch job (avoid reload overhead)
•Use batch prediction APIs (TensorFlow batch_predict, PyTorch batch inference)
•Enable GPU batching for deep learning models (32-512 samples/batch)
•Leverage model compilation (TensorRT, ONNX) for throughput optimization

5. Prediction Storage Design

Store pre-computed predictions for fast lookup

•Key-value store for individual lookups (Redis, DynamoDB: user_id → prediction)
•Columnar storage for analytics (Parquet, BigQuery: all predictions for analysis)
•Include metadata (model_version, prediction_timestamp, confidence_score)
•Set TTL based on batch frequency (1.5x batch interval for overlap)

6. Keyed Predictions Pattern

Enable distributed batch prediction with result matching

•Attach unique keys to input records (primary keys, composite keys)
•Preserve keys through prediction pipeline (input → features → predictions)
•Join predictions back to original entities using keys
•Handle missing predictions (timeouts, errors) with fallback logic

7. Monitoring & Alerting

Track batch job health and prediction quality

•Job completion metrics (duration, throughput, failure rate)
•Data quality checks (null rate, distribution shifts, schema violations)
•Model performance monitoring (prediction distribution, confidence intervals)
•Alerting on batch failures or stale predictions (SLA breaches)

Practical Application

E-commerce Recommendation System

Problem: Generate personalized product recommendations for 10M users Batch Solution:

•Nightly job extracts user behavior (purchases, views, clicks) from data warehouse
•Spark cluster processes 10M users in parallel (10K users/partition × 1K partitions)
•Recommendation model generates top-100 products per user (batch size: 256 users)
•Predictions stored in Redis with 36-hour TTL (user_id → [product_ids + scores])
•Web app reads pre-computed recommendations in <5ms (vs. 200ms real-time inference)

Credit Card Fraud Detection (Batch Component)

Problem: Update fraud risk scores for all accounts daily Batch Solution:

•Daily batch (3am) processes all 50M accounts using last 30 days transactions
•Feature engineering pipeline computes aggregates (transaction velocity, geography patterns)
•XGBoost model scores all accounts (1M accounts/minute on 100-node cluster)
•Risk scores stored in Aurora DB (account_id, risk_score, score_date)
•Real-time transactions query batch scores + apply real-time rules for final decision

Edge Cases & Nuances

Cold Start Problem: New users/items without predictions

•Fallback to popularity-based or demographic-based defaults
•Trigger on-demand prediction for high-value new entities
•Include new entities in next batch cycle with minimal features

Prediction Staleness: Batch predictions lag reality

•Hybrid approach: batch for stable predictions + real-time updates for high-velocity features
•Monitor staleness impact on business metrics (click-through rate decay over time)
•Decrease batch interval if staleness hurts performance (daily → hourly)

Batch Job Failures: Incomplete or failed batch runs

•Implement idempotent batch jobs (can safely re-run without duplicates)
•Use transactional writes to prediction store (all-or-nothing semantics)
•Maintain previous batch predictions as fallback until new batch succeeds

Cost vs. Freshness Tradeoff: More frequent batches = higher cost

•Profile actual prediction change rate (how often do top-10 recommendations shift?)
•A/B test batch frequencies to measure impact on engagement metrics
•Use event-triggered batches for critical updates (product catalog changes)

Anti-Patterns

Batch for Latency-Critical Applications: Using batch for fraud detection that must block transactions in real-time Over-Engineering Batch Infrastructure: Building distributed system for 10K records processable on single machine Ignoring Data Freshness Requirements: Daily batches for inventory predictions when stock changes hourly No Fallback Strategy: System breaks when batch job fails with no stale predictions

Trade-offs

Batch vs. Online Inference:

•Batch: Lower cost (bulk processing), higher latency (stale predictions), simpler ops (scheduled jobs)
•Online: Higher cost (per-request compute), lower latency (fresh predictions), complex ops (SLA-driven)

Batch Frequency:

•More frequent (hourly): Fresher predictions, higher compute cost, more operational complexity
•Less frequent (daily): Stale predictions, lower cost, simpler ops, higher storage requirements

Distributed vs. Single-Node:

•Distributed: Scales to billions of records, complex infrastructure, slower for small datasets
•Single-node: Simple, fast for <10M records, memory/compute constraints, no fault tolerance

Related Frameworks

•Streaming ML Pattern: Continuous prediction updates from streaming data (complements batch)
•Online Learning Pattern: Incremental model updates as new data arrives (batch retraining alternative)
•Lambda Architecture: Batch layer + speed layer for hybrid batch/streaming systems
•Feature Store Pattern: Centralized feature computation for batch and online consistency

Practitioner Sources

•Google ML Design Patterns (Lakshmanan et al.): Batch Serving pattern (#17), Keyed Predictions pattern
•ML System Design: Batch vs. online prediction serving tradeoffs, architecture patterns
•Apache Spark MLlib: Distributed batch prediction at scale, best practices
•AWS SageMaker Batch Transform: Managed batch inference service, cost optimization