Code Like alamb: Rust Programming Style Guide
This document captures the coding patterns, design preferences, and stylistic conventions observed across alamb's Rust projects. These projects focus heavily on Apache Arrow, Parquet, DataFusion, and high-performance data processing.
Project Structure & Organization
Module Organization
- •Place
moddeclarations at the top ofmain.rsorlib.rs - •Split modules into logical units by responsibility (e.g.,
error.rs,query.rs,replay.rs) - •Use
pub modonly when external visibility is needed - •Keep
main.rsfocused on CLI parsing and orchestration; delegate logic to modules
mod benchmark;
mod datagen;
mod file_type;
mod parquet_file;
use crate::benchmark::{MetadataParseBenchmark, MetadataParseResult};
use crate::file_type::FileType;
Workspace Projects
- •Use Cargo workspaces for related crates that need to be developed together
- •Useful when comparing different library versions (e.g.,
parquet9,parquetnext)
Error Handling
Custom Error Types
Prefer simple custom error types with Display and From implementations:
use std::fmt::Display;
#[derive(Debug, Clone)]
pub struct Error {
msg: String,
}
impl Display for Error {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "Error: {}", self.msg)
}
}
pub type Result<T> = std::result::Result<T, Error>;
impl From<std::io::Error> for Error {
fn from(e: std::io::Error) -> Self {
Self { msg: format!("std::io::Error: {}", e) }
}
}
impl From<String> for Error {
fn from(msg: String) -> Self {
Self { msg }
}
}
StringifyError Trait
For quick prototyping or when structured errors are overkill:
pub trait StringifyError<T> {
fn stringify(self) -> Result<T, String>;
fn context(self, msg: &str) -> Result<T, String>;
}
impl<T, E: std::fmt::Display> StringifyError<T> for Result<T, E> {
fn stringify(self) -> Result<T, String> {
self.map_err(|e| e.to_string())
}
fn context(self, msg: &str) -> Result<T, String> {
self.map_err(|e| format!("{}: {}", msg, e))
}
}
Result Type Aliases
Define module-level Result type aliases for cleaner signatures:
pub type Result<T, E = String> = std::result::Result<T, E>;
CLI & Configuration
Clap with Derive
Use clap's derive macros for CLI parsing:
use clap::Parser;
/// Command line program description goes here
#[derive(Parser, Debug)]
#[clap(author, version, about)]
struct Args {
#[clap(long, parse(from_os_str))]
/// Search path for files
path: PathBuf,
#[clap(long, default_value = "")]
/// Optional filter
filter: String,
}
Subcommand Pattern
Use enums for subcommands:
#[derive(Parser, Debug)]
#[clap(author, version, about)]
enum MyTool {
/// Dump raw entries
DumpEntries(DumpEntries),
/// Process calls
DumpCalls(DumpCalls),
}
fn main() {
let args = MyTool::parse();
match args {
MyTool::DumpEntries(dump) => { /* ... */ }
MyTool::DumpCalls(dump) => { /* ... */ }
}
}
Environment Variable Support
Support environment variables for configuration:
#[structopt(
short,
long,
global = true,
env = "IOX_ADDR",
default_value = "http://127.0.0.1:8082"
)]
host: String,
Async/Tokio Patterns
Main Function
Use #[tokio::main] with appropriate flavor:
#[tokio::main(flavor = "multi_thread")]
async fn main() {
// ...
}
JoinSet for Concurrent Tasks
Use JoinSet for managing multiple concurrent tasks:
use tokio::task::JoinSet;
let mut join_set = JoinSet::new();
for item in items {
join_set.spawn(async move {
process(item).await
});
}
let results = join_set.join_all().await;
Channel-based Task Coordination
Use mpsc channels for producer-consumer patterns:
let (tx, mut rx) = tokio::sync::mpsc::channel(buffer_size);
// Producer task
join_set.spawn(async move {
while let Some(item) = rx.recv().await {
process(item);
}
});
// Send work
tx.send(work).await?;
FuturesUnordered for Concurrent Operations
use futures::{stream::FuturesUnordered, StreamExt};
let tasks = items
.map(|item| tokio::task::spawn(process(item)))
.collect::<FuturesUnordered<_>>();
let results = tasks.collect::<Vec<_>>().await;
Builder Pattern
Standard Builder with with_ Methods
Use consuming self pattern for builder methods:
#[derive(Debug, Default)]
pub struct ConfigBuilder {
path: Option<PathBuf>,
columns: Option<usize>,
}
impl ConfigBuilder {
pub fn new() -> Self {
Default::default()
}
pub fn with_path(mut self, path: PathBuf) -> Self {
self.path = Some(path);
self
}
pub fn with_columns(mut self, columns: usize) -> Self {
self.columns = Some(columns);
self
}
pub fn build(self) -> Config {
let Self { path, columns } = self;
Config {
path: path.expect("path is required"),
columns: columns.expect("columns is required"),
}
}
}
Mutable Builder with &mut self Return
For builders that accumulate state incrementally:
impl Call {
pub fn with_timestamp(&mut self, timestamp: Option<DateTime<Utc>>) -> &mut Self {
if let Some(ts) = timestamp {
self.start_time = self.start_time.take()
.map(|existing| existing.min(ts))
.or(Some(ts));
}
self
}
}
Display & Debug Traits
Custom Display for User-Facing Output
impl Display for Config {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, " {:?} {} cols {} row groups", self.file_type, self.columns, self.row_groups)
}
}
Custom Debug with debug_struct
impl Debug for Benchmark {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("Benchmark")
.field("file_path", &self.file_path)
.field("file_len", &self.file_len)
.field("footer_bytes", &self.footer_bytes.len()) // Show len, not contents
.finish()
}
}
Type Annotations
Explicit Closure Types
Keep type annotations in closures for clarity:
let total_rows: usize = filters.iter().map(|f: &Filter| f.total_rows()).sum();
let results: Vec<Result> = items
.iter()
.filter_map(|item: &Item| -> Option<Result> {
process(item)
})
.collect();
Enum Patterns
Simple Enums with Display
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum FileType {
Float,
String,
}
impl Display for FileType {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
FileType::Float => write!(f, "Float32"),
FileType::String => write!(f, "String"),
}
}
}
FromStr for CLI Parsing
impl FromStr for CallFormat {
type Err = String;
fn from_str(s: &str) -> Result<Self, Self::Err> {
match s {
"pretty" => Ok(Self::Pretty),
"bin" => Ok(Self::Binary),
_ => Err("supported formats: {pretty, bin}".to_string()),
}
}
}
Constants & Configuration
Module-Level Constants
Use SCREAMING_SNAKE_CASE for constants at module level:
const NUM_CLIENTS: usize = 20; const NUM_REQUESTS: usize = 100; const LINES_PER_REQUEST: usize = 10000; const TEST_DURATION_SECS: u64 = 5;
Benchmarking & Timing
Timing Structures
#[derive(Debug)]
pub struct Timing {
num_runs: usize,
total_duration: Duration,
}
impl Timing {
pub fn avg_duration(&self) -> Duration {
self.total_duration / self.num_runs as u32
}
}
Warm-up Before Measurement
// warm up with 10 runs
for _ in 0..10 {
run_once();
}
// now run the actual benchmark
let mut total_duration = Duration::from_secs(0);
for _ in 0..self.num_runs {
let duration = run_once();
total_duration += duration;
}
RAII Timer Pattern
struct RAAITimer {
start: std::time::Instant,
}
impl Default for RAAITimer {
fn default() -> Self {
Self { start: Instant::now() }
}
}
impl RAAITimer {
fn done(self) -> String {
format!("{:?}", self.start.elapsed())
}
}
Macros
Helper Macros for Repetitive Patterns
macro_rules! push_range {
($decoder:expr, $range:expr, $bytes:expr) => {
$decoder
.push_ranges(vec![$range.clone()], vec![$bytes.clone()])
.unwrap();
};
}
Function Signatures
Flexible Input with impl Into<T>
pub fn new(description: impl Into<String>, file_path: PathBuf) -> Self {
Self {
description: description.into(),
file_path,
// ...
}
}
pub fn try_new(file_name: impl Into<PathBuf>) -> Result<Self, String> {
let file_name = file_name.into();
// ...
}
Returning impl Trait
pub fn header() -> impl Display {
struct Header {}
impl Display for Header {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(f, "col1\tcol2\tcol3")
}
}
Header {}
}
Code Style
Import Organization
- •Standard library imports
- •External crate imports
- •Internal module imports (
use crate::...)
use std::fmt::{Display, Formatter};
use std::path::PathBuf;
use std::sync::Arc;
use arrow::array::BooleanArray;
use tokio::task::JoinSet;
use crate::benchmark::MetadataParseBenchmark;
use crate::file_type::FileType;
Minimal Comments
- •Code should be self-documenting
- •Use doc comments (
///) for public APIs - •Brief inline comments only when logic is non-obvious
Assertions for Programming Errors
Use assert! and panic! for conditions that indicate bugs:
assert!(self.method_name.is_none(), "Already have method name: {:?}", self.method_name);
let DecodeResult::Data(_metadata) = decoder.try_decode().unwrap() else {
panic!("Expected to be done with parsing");
};
Early Returns
Prefer early returns for error conditions:
if filters.is_empty() {
println!("No filters found");
return;
}
Arrow-Specific Patterns
RecordBatch Processing
let mut batches = vec![];
for maybe_record_batch in record_batch_reader {
if batches.len() > 100 {
println!("{}", pretty_format_batches(&batches)?);
batches.clear();
}
batches.push(maybe_record_batch?);
}
Schema Building
let fields: Vec<Field> = (0..columns)
.map(|i| Field::new(format!("col_{i}"), DataType::Float32, true))
.collect();
Arc::new(Schema::new(fields))
Array Creation with Iterators
let array: Int64Array = (0..size)
.map(|_| {
if rng.random::<f32>() < null_density {
None
} else {
Some(rng.random())
}
})
.collect();
Testing Approaches
File-Based Tests
For code that acts on files, create transient test files:
let mut temp_file = tempfile::NamedTempFile::new().unwrap(); // write test data // run tests // temp_file is automatically cleaned up
Benchmark Output
Provide both human-readable and CSV output:
println!("CSV output:");
println!("{}", headers.join(","));
for result in &results {
println!("{}", result.to_csv_row().join(","));
}
Anti-Patterns to Avoid
- •Don't over-abstract - Three similar lines of code is better than a premature abstraction
- •Don't add unnecessary error handling - Trust internal code and framework guarantees
- •Don't add comments stating the obvious - Let the code speak for itself
- •Don't create unused helpers - Only add utilities when they're needed more than once
- •Don't guess at future requirements - Build for what's needed now
Domain-Specific Notes
This codebase focuses on:
- •Apache Arrow: In-memory columnar data format
- •Parquet: Columnar storage file format
- •DataFusion: SQL query engine
- •InfluxDB IOx: Time-series database
- •Performance benchmarking: Measuring and comparing implementations
When working in this domain, prefer:
- •Streaming/iterator patterns for large data
- •Parallel processing with Tokio
- •Zero-copy operations where possible
- •Explicit type annotations for complex generic code