Feature Store Design
Architecture Decision Table
| Criteria | Feast (OSS) | Tecton | Vertex Feature Store | Custom (Redis + Warehouse) |
|---|---|---|---|---|
| Setup cost | Low | High (SaaS) | Medium (GCP-only) | Medium-High |
| Online serving latency | <10ms (Redis) | <5ms | <10ms | Depends on impl |
| Offline store | File/BigQuery/Redshift | Spark/Snowflake | BigQuery | Your warehouse |
| Streaming features | Limited (push-based) | Native Spark/Flink | Dataflow | Build your own |
| Point-in-time joins | Built-in | Built-in | Built-in | Must implement |
| Team size sweet spot | 2-15 | 15-100+ | Any (GCP shops) | 5-20 (eng-heavy) |
Recommendation: Feast for most teams. It covers 80% of use cases with minimal operational burden. Go custom only when sub-millisecond latency or complex streaming transformations are hard requirements.
Feast Feature Definition
feature_store.yaml
project: my_ml_project registry: gs://my-bucket/feast/registry.pb provider: gcp online_store: type: redis connection_string: redis://10.0.0.5:6379 offline_store: type: bigquery entity_key_serialization_version: 2
Feature Definitions
from datetime import timedelta
from feast import Entity, FeatureView, Field, BatchFeatureView
from feast.types import Float32, Int64, String
from feast.infra.offline_stores.bigquery_source import BigQuerySource
# --- Entities ---
user = Entity(
name="user_id",
join_keys=["user_id"],
description="Unique user identifier",
)
product = Entity(
name="product_id",
join_keys=["product_id"],
)
# --- Data Sources ---
user_stats_source = BigQuerySource(
name="user_stats",
table="ml_features.user_daily_stats",
timestamp_field="event_date",
created_timestamp_column="created_at",
)
# --- Feature Views ---
user_features = BatchFeatureView(
name="user_features",
entities=[user],
ttl=timedelta(days=7),
schema=[
Field(name="order_count_30d", dtype=Int64),
Field(name="avg_order_value_30d", dtype=Float32),
Field(name="days_since_last_order", dtype=Int64),
Field(name="lifetime_value", dtype=Float32),
Field(name="preferred_category", dtype=String),
],
source=user_stats_source,
online=True,
tags={"team": "recommendations", "version": "v2"},
)
Point-in-Time Join
The most critical feature store operation. Prevents future data from leaking into training examples.
How It Works
Given training events with timestamps, join the feature value that was most recent as of each event timestamp.
Event: user_id=42, event_time=2024-03-15 10:00:00 Feature: user_id=42, order_count_30d=5, event_date=2024-03-14 <-- correct Feature: user_id=42, order_count_30d=7, event_date=2024-03-16 <-- FUTURE, must exclude
Feast Point-in-Time Retrieval
from feast import FeatureStore
import pandas as pd
store = FeatureStore(repo_path="feature_repo/")
# Training events with timestamps
entity_df = pd.DataFrame({
"user_id": [42, 99, 42, 17],
"event_timestamp": pd.to_datetime([
"2024-03-15 10:00:00",
"2024-03-15 14:00:00",
"2024-03-10 08:00:00", # same user, earlier time = different features
"2024-03-12 12:00:00",
]),
"label": [1, 0, 1, 0],
})
# Feast handles point-in-time join automatically
training_df = store.get_historical_features(
entity_df=entity_df,
features=[
"user_features:order_count_30d",
"user_features:avg_order_value_30d",
"user_features:days_since_last_order",
],
).to_df()
# Result: each row gets features AS OF its event_timestamp
Online Serving Setup
Materialization
# Materialize features from offline -> online store feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S") # Or in code store.materialize_incremental(end_date=datetime.utcnow())
Online Retrieval for Inference
from feast import FeatureStore
store = FeatureStore(repo_path="feature_repo/")
# Single entity lookup -- low latency for real-time serving
features = store.get_online_features(
features=[
"user_features:order_count_30d",
"user_features:avg_order_value_30d",
"user_features:lifetime_value",
],
entity_rows=[{"user_id": 42}],
).to_dict()
# Returns: {"user_id": [42], "order_count_30d": [5], ...}
Push-Based Streaming Features
from feast import FeatureStore
from feast.data_source import PushMode
from datetime import datetime
import pandas as pd
store = FeatureStore(repo_path="feature_repo/")
# Push fresh features from a streaming job (Kafka consumer, etc.)
store.push(
push_source_name="user_realtime_stats",
df=pd.DataFrame({
"user_id": [42],
"session_duration_sec": [340],
"pages_viewed": [12],
"event_timestamp": [datetime.utcnow()],
}),
to=PushMode.ONLINE, # or ONLINE_AND_OFFLINE
)
Gotchas and Anti-Patterns
Training-Serving Skew
Problem: Features computed differently at training time (SQL/Spark batch) vs serving time (Python real-time). Model performance degrades in production.
Fix: Define feature transformations once. Use Feast on-demand feature views or a shared transformation library. Test by comparing online vs offline retrieval for the same entity+timestamp.
# Detect skew: compare online vs offline for same entities
online = store.get_online_features(
features=feature_list, entity_rows=entities
).to_df()
offline = store.get_historical_features(
entity_df=entity_df_now, features=feature_list
).to_df()
# Assert values match within tolerance
Time-Travel Bugs
Problem: Using created_at instead of event_date as the timestamp field. Backfilled data gets created_at = now(), making all historical point-in-time joins use the backfilled values.
Fix: Always use the business timestamp (when the event occurred), not the ingestion timestamp. Set timestamp_field to the event time, use created_timestamp_column only for deduplication.
Feature Freshness
Problem: Materialization runs hourly but model expects real-time features. Stale features cause prediction drift.
Fix: Monitor feature age. Alert when online store values are older than expected TTL:
# Check feature freshness
metadata = store.get_online_features(
features=["user_features:order_count_30d"],
entity_rows=[{"user_id": 42}],
full_feature_names=True,
)
# Compare event_timestamp against current time
Entity Key Design
Problem: Using composite keys like (user_id, session_id) when features are really per-user. Creates sparse online store, slow lookups.
Fix: One entity per natural grain. Use user_id for user features, (user_id, product_id) only for interaction features. Keep entity key cardinality manageable for online store memory.
| Entity Pattern | Online Store Size | Lookup Speed | Use Case |
|---|---|---|---|
user_id | ~N users | Fast | User-level aggregates |
product_id | ~N products | Fast | Product metadata/stats |
(user_id, product_id) | ~N*M | Slow if M is large | Interaction features |
session_id | Unbounded | Degrades over time | Avoid; use TTL aggressively |