Substreams Development Expert
Expert assistant for building Substreams projects - high-performance blockchain data indexing and transformation.
Core Concepts
What is Substreams?
Substreams is a powerful blockchain indexing technology that enables:
- •Parallel processing of blockchain data with high performance
- •Composable modules written in Rust (map, store, index types)
- •Protobuf schemas for typed data structures
- •Streaming-first architecture with cursor-based reorg handling
Key Components
- •Manifest (
substreams.yaml): Defines modules, networks, dependencies - •Modules: Map (transform), Store (aggregate), Index (filter)
- •Protobuf: Type-safe schemas for inputs and outputs
- •WASM: Rust code compiled to WebAssembly for execution
Project Structure
my-substreams/ ├── substreams.yaml # Manifest ├── proto/ │ └── events.proto # Schema definitions ├── src/ │ └── lib.rs # Rust module code ├── Cargo.toml # Rust dependencies └── build/ # Generated files (gitignored)
Prerequisites
Required CLI Tools
- •substreams: Core CLI for building, running, and deploying
- •buf: Required by
substreams buildfor protobuf code generation
Authentication
Running substreams run against hosted endpoints requires authentication:
substreams auth # Interactive authentication # Or set SUBSTREAMS_API_TOKEN environment variable
Common Workflows
Creating a New Project
- •Initialize: Use
substreams initor create manifest manually - •Define schema: Create
.protofiles for your data structures - •Implement modules: Write Rust handlers in
src/lib.rs - •Build: Run
substreams buildto compile to.spkg - •Test: Run
substreams runwith small block range (recommended: 1000 blocks) - •Deploy: Publish to registry or deploy as service
Module Types
Map Module - Transforms input to output
- name: map_events
kind: map
inputs:
- source: sf.ethereum.type.v2.Block
output:
type: proto:my.types.Events
Store Module - Aggregates data across blocks
- name: store_totals
kind: store
updatePolicy: add
valueType: int64
inputs:
- map: map_events
Index Module - Filters blocks for efficient querying
- name: index_transfers
kind: index
inputs:
- map: map_events
output:
type: proto:sf.substreams.index.v1.Keys
Debugging Checklist
When modules produce unexpected results:
- •Validate manifest:
substreams graphto visualize dependencies - •Test small range: Run 100-1000 blocks, inspect outputs carefully
- •Check logs: Look for WASM panics, protobuf decode errors
- •Verify schema: Ensure proto types match expected data
- •Review inputs: Confirm input modules produce correct data
- •Initial block: Check
initialBlockis set appropriately
Performance Optimization
- •Use indexes to skip irrelevant blocks
- •Minimize store size by storing only necessary data
- •Production mode enables parallel execution:
--production-mode - •Module granularity: Smaller, focused modules perform better
- •Avoid deep nesting: Flatten module dependencies when possible
Manifest Reference
See references/manifest-spec.md for complete specification.
Key Sections
Package metadata:
specVersion: v0.1.0 package: name: my-substreams version: v1.0.0 description: Description of what this substreams does
Protobuf imports:
protobuf:
files:
- events.proto
importPaths:
- ./proto
Binary reference (WASM code):
binaries:
default:
type: wasm/rust-v1
file: ./target/wasm32-unknown-unknown/release/my_substreams.wasm
Network configuration:
network: mainnet
Supported networks: See references/networks.md
Rust Module Development
Map Handler Example
use substreams::errors::Error;
use substreams::prelude::*;
use substreams_ethereum::pb::eth::v2::Block;
#[substreams::handlers::map]
pub fn map_events(block: Block) -> Result<Events, Error> {
let mut events = Events::default();
for trx in block.transactions() {
for (log, _call) in trx.logs_with_calls() {
// Process logs, extract events
if is_transfer_event(log) {
events.transfers.push(extract_transfer(log));
}
}
}
Ok(events)
}
Store Handler Example
#[substreams::handlers::store]
pub fn store_totals(events: Events, store: StoreAddInt64) {
for transfer in events.transfers {
store.add(0, &transfer.token, transfer.amount as i64);
}
}
Best Practices
- •Handle errors gracefully: Use
Result<T, Error>returns - •Log sparingly: Excessive logging impacts performance
- •Validate inputs: Check for null/empty data before processing
- •Use substreams helpers: Leverage
substreams-ethereumcrate - •Test locally first: Always test with
substreams runbefore deploying - •Avoid excessive cloning: Use ownership transfer (see Performance section below)
Performance: Avoiding Excessive Cloning
CRITICAL: One of the greatest performance impacts in Substreams is excessive cloning of data structures.
The Problem
Cloning large data structures is expensive:
- •❌ Cloning a Transaction: Copies all fields, logs, traces
- •❌ Cloning a Block: Copies the entire block including all transactions (EXTREMELY expensive)
- •❌ Cloning in loops: Multiplies the cost by number of iterations
The Solution: Ownership Transfer
Use Rust's ownership system to transfer or borrow data instead of cloning.
Bad Example (Excessive Cloning)
#[substreams::handlers::map]
pub fn map_events(block: Block) -> Result<Events, Error> {
let mut events = Events::default();
for trx in block.transactions() {
// ❌ BAD: Cloning entire transaction
let transaction = trx.clone();
for (log, _call) in transaction.logs_with_calls() {
// ❌ BAD: Cloning log
let log_copy = log.clone();
if is_transfer_event(&log_copy) {
events.transfers.push(extract_transfer(&log_copy));
}
}
}
Ok(events)
}
Good Example (Ownership Transfer)
#[substreams::handlers::map]
pub fn map_events(block: Block) -> Result<Events, Error> {
let mut events = Events::default();
// ✅ GOOD: Iterate by reference
for trx in block.transactions() {
// ✅ GOOD: Borrow, don't clone
for (log, _call) in trx.logs_with_calls() {
if is_transfer_event(log) {
// ✅ GOOD: Only extract what you need
events.transfers.push(extract_transfer(log));
}
}
}
Ok(events)
}
fn is_transfer_event(log: &Log) -> bool {
// Use reference, no cloning
!log.topics.is_empty() &&
log.topics[0] == TRANSFER_EVENT_SIGNATURE
}
fn extract_transfer(log: &Log) -> Transfer {
// Extract only the fields you need
Transfer {
from: Hex::encode(&log.topics[1]),
to: Hex::encode(&log.topics[2]),
amount: Hex::encode(&log.data),
// Don't copy the entire log
}
}
When Cloning is Acceptable
Clone only small, necessary data:
// ✅ OK: Cloning small strings let token_address = Hex::encode(&log.address).clone(); // ✅ OK: Cloning primitive types let block_number = block.number.clone(); // ❌ BAD: Cloning entire structures let block_copy = block.clone(); // Never do this! let trx_copy = transaction.clone(); // Avoid this!
Performance Tips
- •
Use
logs_with_calls(): Iterate logs without cloningrustfor (log, _call) in trx.logs_with_calls() { } // Good for log in trx.receipt.as_ref().unwrap().logs.clone() { } // Bad - •
Use references when appropriate: Pass references to avoid unnecessary cloning
rustfn process_log(log: &Log) { } // Good for read-only access fn process_log(log: Log) { } // Good when consuming/transforming data - •
Extract minimal data: Only copy what you actually need
rust// Good: Extract only needed fields let amount = parse_amount(&log.data); // Bad: Copy entire log just to get one field let log_copy = log.clone(); let amount = parse_amount(&log_copy.data);
- •
Use
into()for consumption: When you need to consume datarust// When you truly need to take ownership events.transfers.push(Transfer { from: topics[1].into(), // Consumes the data to: topics[2].into(), });
Common Pitfalls
Pitfall #1: Cloning in filters
// ❌ BAD
block.transactions()
.iter()
.filter(|trx| trx.clone().to == target) // Clone every transaction!
// ✅ GOOD
block.transactions()
.iter()
.filter(|trx| trx.to == target) // Just compare
Pitfall #2: Unnecessary defensive copies
// ❌ BAD
let block_copy = block.clone();
for trx in block_copy.transactions() { } // Why clone the whole block?
// ✅ GOOD
for trx in block.transactions() { } // Use the block directly
Pitfall #3: Cloning for mutation
// ❌ BAD let mut trx_copy = trx.clone(); trx_copy.value = process(trx_copy.value); // Clone just to mutate // ✅ GOOD let new_value = process(&trx.value); // Process reference, create new value
Measuring Impact
Use substreams run with timing to measure performance:
# Test with cloning (slow) time substreams run -s 17000000 -t +1000 map_events # Test without cloning (fast) time substreams run -s 17000000 -t +1000 map_events # You should see significant speedup (2-10x) by avoiding clones
Remember
- •🎯 Measure performance impact: Use timing with
substreams runto identify bottlenecks - •🎯 Clone only when necessary: Most of the time, borrowing is sufficient
- •🎯 Block cloning is almost never needed: This is the #1 performance killer
- •🎯 Transaction cloning should be rare: Extract only the data you need
Common Patterns
See references/patterns.md for detailed examples:
- •Event extraction from logs
- •Store aggregation patterns
- •Multi-module composition
- •Parameterized modules
- •Dynamic data sources
- •Database sink patterns (delta updates, composite keys, sink SQL workflow)
Querying Chain Head Block
To get the current head block of a chain (useful for determining the latest block number):
Using Substreams:
# Quick head block lookup for a network substreams run common@latest -s -1 --network mainnet # Or with explicit endpoint substreams run common@latest -e=<network-id-alias-or-host> -s -1 -o jsonl
Read the first line of output to get the head block information. The -s -1 flag starts from the latest block.
Using firecore:
# JSON output (use jq for further processing if available) firecore tools firehose-client <network-id-alias-or-host> -o json -- -1 # Text output (less detail), first line looks like: # Block #24327807 (14b58bd3fa091c05a46d084bba1e78090d52556d29f4312da77b7aa3220423f4) firecore tools firehose-client <network-id-alias-or-host> -o text -- -1
Read the first line of output to get the head block information.
Development Tips
- •Start small: Begin with 1000 block range for testing
- •Use GUI:
substreams guifor visual debugging (when available) - •Version control: Commit
.spkgfiles for reproducibility - •Document modules: Add
doc:fields in manifest for clarity
Troubleshooting
Build fails:
- •Check Rust toolchain:
rustup target add wasm32-unknown-unknown - •Ensure
bufCLI is installed (required for proto generation) - •Verify proto imports are correct
- •Add
protobuf.excludePathswithsf/substreamsandgooglewhen importing spkgs - •Ensure binary path in manifest matches build output
Empty output:
- •Confirm
initialBlockis before first relevant block - •Check module isn't filtered out by upstream index
- •Verify input data exists in block range
Performance issues:
- •Add indexes to skip irrelevant blocks
- •Use
--production-modefor large ranges
Resources
- •Official Documentation
- •Module Types Guide
- •Manifest Specification
- •Common Patterns
- •Supported Networks