Code Like alamb: Rust Programming Style Guide

This document captures the coding patterns, design preferences, and stylistic conventions observed across alamb's Rust projects. These projects focus heavily on Apache Arrow, Parquet, DataFusion, and high-performance data processing.

Project Structure & Organization

Module Organization

•Place mod declarations at the top of main.rs or lib.rs
•Split modules into logical units by responsibility (e.g., error.rs, query.rs, replay.rs)
•Use pub mod only when external visibility is needed
•Keep main.rs focused on CLI parsing and orchestration; delegate logic to modules

rust

mod benchmark;
mod datagen;
mod file_type;
mod parquet_file;

use crate::benchmark::{MetadataParseBenchmark, MetadataParseResult};
use crate::file_type::FileType;

Workspace Projects

•Use Cargo workspaces for related crates that need to be developed together
•Useful when comparing different library versions (e.g., parquet9, parquetnext)

Error Handling

Custom Error Types

Prefer simple custom error types with Display and From implementations:

rust

use std::fmt::Display;

#[derive(Debug, Clone)]
pub struct Error {
    msg: String,
}

impl Display for Error {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "Error: {}", self.msg)
    }
}

pub type Result<T> = std::result::Result<T, Error>;

impl From<std::io::Error> for Error {
    fn from(e: std::io::Error) -> Self {
        Self { msg: format!("std::io::Error: {}", e) }
    }
}

impl From<String> for Error {
    fn from(msg: String) -> Self {
        Self { msg }
    }
}

StringifyError Trait

For quick prototyping or when structured errors are overkill:

rust

pub trait StringifyError<T> {
    fn stringify(self) -> Result<T, String>;
    fn context(self, msg: &str) -> Result<T, String>;
}

impl<T, E: std::fmt::Display> StringifyError<T> for Result<T, E> {
    fn stringify(self) -> Result<T, String> {
        self.map_err(|e| e.to_string())
    }

    fn context(self, msg: &str) -> Result<T, String> {
        self.map_err(|e| format!("{}: {}", msg, e))
    }
}

Result Type Aliases

Define module-level Result type aliases for cleaner signatures:

rust

pub type Result<T, E = String> = std::result::Result<T, E>;

CLI & Configuration

Clap with Derive

Use clap's derive macros for CLI parsing:

rust

use clap::Parser;

/// Command line program description goes here
#[derive(Parser, Debug)]
#[clap(author, version, about)]
struct Args {
    #[clap(long, parse(from_os_str))]
    /// Search path for files
    path: PathBuf,

    #[clap(long, default_value = "")]
    /// Optional filter
    filter: String,
}

Subcommand Pattern

Use enums for subcommands:

rust

#[derive(Parser, Debug)]
#[clap(author, version, about)]
enum MyTool {
    /// Dump raw entries
    DumpEntries(DumpEntries),
    /// Process calls
    DumpCalls(DumpCalls),
}

fn main() {
    let args = MyTool::parse();
    match args {
        MyTool::DumpEntries(dump) => { /* ... */ }
        MyTool::DumpCalls(dump) => { /* ... */ }
    }
}

Environment Variable Support

Support environment variables for configuration:

rust

#[structopt(
    short,
    long,
    global = true,
    env = "IOX_ADDR",
    default_value = "http://127.0.0.1:8082"
)]
host: String,

Async/Tokio Patterns

Main Function

Use #[tokio::main] with appropriate flavor:

rust

#[tokio::main(flavor = "multi_thread")]
async fn main() {
    // ...
}

JoinSet for Concurrent Tasks

Use JoinSet for managing multiple concurrent tasks:

rust

use tokio::task::JoinSet;

let mut join_set = JoinSet::new();
for item in items {
    join_set.spawn(async move {
        process(item).await
    });
}

let results = join_set.join_all().await;

Channel-based Task Coordination

Use mpsc channels for producer-consumer patterns:

rust

let (tx, mut rx) = tokio::sync::mpsc::channel(buffer_size);

// Producer task
join_set.spawn(async move {
    while let Some(item) = rx.recv().await {
        process(item);
    }
});

// Send work
tx.send(work).await?;

FuturesUnordered for Concurrent Operations

rust

use futures::{stream::FuturesUnordered, StreamExt};

let tasks = items
    .map(|item| tokio::task::spawn(process(item)))
    .collect::<FuturesUnordered<_>>();

let results = tasks.collect::<Vec<_>>().await;

Builder Pattern

Standard Builder with `with_` Methods

Use consuming self pattern for builder methods:

rust

#[derive(Debug, Default)]
pub struct ConfigBuilder {
    path: Option<PathBuf>,
    columns: Option<usize>,
}

impl ConfigBuilder {
    pub fn new() -> Self {
        Default::default()
    }

    pub fn with_path(mut self, path: PathBuf) -> Self {
        self.path = Some(path);
        self
    }

    pub fn with_columns(mut self, columns: usize) -> Self {
        self.columns = Some(columns);
        self
    }

    pub fn build(self) -> Config {
        let Self { path, columns } = self;
        Config {
            path: path.expect("path is required"),
            columns: columns.expect("columns is required"),
        }
    }
}

Mutable Builder with `&mut self` Return

For builders that accumulate state incrementally:

rust

impl Call {
    pub fn with_timestamp(&mut self, timestamp: Option<DateTime<Utc>>) -> &mut Self {
        if let Some(ts) = timestamp {
            self.start_time = self.start_time.take()
                .map(|existing| existing.min(ts))
                .or(Some(ts));
        }
        self
    }
}

Display & Debug Traits

Custom Display for User-Facing Output

rust

impl Display for Config {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, " {:?} {} cols {} row groups", self.file_type, self.columns, self.row_groups)
    }
}

Custom Debug with `debug_struct`

rust

impl Debug for Benchmark {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.debug_struct("Benchmark")
            .field("file_path", &self.file_path)
            .field("file_len", &self.file_len)
            .field("footer_bytes", &self.footer_bytes.len())  // Show len, not contents
            .finish()
    }
}

Type Annotations

Explicit Closure Types

Keep type annotations in closures for clarity:

rust

let total_rows: usize = filters.iter().map(|f: &Filter| f.total_rows()).sum();

let results: Vec<Result> = items
    .iter()
    .filter_map(|item: &Item| -> Option<Result> {
        process(item)
    })
    .collect();

Enum Patterns

Simple Enums with Display

rust

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum FileType {
    Float,
    String,
}

impl Display for FileType {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            FileType::Float => write!(f, "Float32"),
            FileType::String => write!(f, "String"),
        }
    }
}

FromStr for CLI Parsing

rust

impl FromStr for CallFormat {
    type Err = String;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        match s {
            "pretty" => Ok(Self::Pretty),
            "bin" => Ok(Self::Binary),
            _ => Err("supported formats: {pretty, bin}".to_string()),
        }
    }
}

Constants & Configuration

Module-Level Constants

Use SCREAMING_SNAKE_CASE for constants at module level:

rust

const NUM_CLIENTS: usize = 20;
const NUM_REQUESTS: usize = 100;
const LINES_PER_REQUEST: usize = 10000;
const TEST_DURATION_SECS: u64 = 5;

Benchmarking & Timing

Timing Structures

rust

#[derive(Debug)]
pub struct Timing {
    num_runs: usize,
    total_duration: Duration,
}

impl Timing {
    pub fn avg_duration(&self) -> Duration {
        self.total_duration / self.num_runs as u32
    }
}

Warm-up Before Measurement

rust

// warm up with 10 runs
for _ in 0..10 {
    run_once();
}

// now run the actual benchmark
let mut total_duration = Duration::from_secs(0);
for _ in 0..self.num_runs {
    let duration = run_once();
    total_duration += duration;
}

RAII Timer Pattern

rust

struct RAAITimer {
    start: std::time::Instant,
}

impl Default for RAAITimer {
    fn default() -> Self {
        Self { start: Instant::now() }
    }
}

impl RAAITimer {
    fn done(self) -> String {
        format!("{:?}", self.start.elapsed())
    }
}

Macros

Helper Macros for Repetitive Patterns

rust

macro_rules! push_range {
    ($decoder:expr, $range:expr, $bytes:expr) => {
        $decoder
            .push_ranges(vec![$range.clone()], vec![$bytes.clone()])
            .unwrap();
    };
}

Function Signatures

Flexible Input with `impl Into<T>`

rust

pub fn new(description: impl Into<String>, file_path: PathBuf) -> Self {
    Self {
        description: description.into(),
        file_path,
        // ...
    }
}

pub fn try_new(file_name: impl Into<PathBuf>) -> Result<Self, String> {
    let file_name = file_name.into();
    // ...
}

Returning `impl Trait`

rust

pub fn header() -> impl Display {
    struct Header {}
    impl Display for Header {
        fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
            write!(f, "col1\tcol2\tcol3")
        }
    }
    Header {}
}

Code Style

Import Organization

•Standard library imports
•External crate imports
•Internal module imports (use crate::...)

rust

use std::fmt::{Display, Formatter};
use std::path::PathBuf;
use std::sync::Arc;

use arrow::array::BooleanArray;
use tokio::task::JoinSet;

use crate::benchmark::MetadataParseBenchmark;
use crate::file_type::FileType;

Minimal Comments

•Code should be self-documenting
•Use doc comments (///) for public APIs
•Brief inline comments only when logic is non-obvious

Assertions for Programming Errors

Use assert! and panic! for conditions that indicate bugs:

rust

assert!(self.method_name.is_none(), "Already have method name: {:?}", self.method_name);

let DecodeResult::Data(_metadata) = decoder.try_decode().unwrap() else {
    panic!("Expected to be done with parsing");
};

Early Returns

Prefer early returns for error conditions:

rust

if filters.is_empty() {
    println!("No filters found");
    return;
}

Arrow-Specific Patterns

RecordBatch Processing

rust

let mut batches = vec![];
for maybe_record_batch in record_batch_reader {
    if batches.len() > 100 {
        println!("{}", pretty_format_batches(&batches)?);
        batches.clear();
    }
    batches.push(maybe_record_batch?);
}

Schema Building

rust

let fields: Vec<Field> = (0..columns)
    .map(|i| Field::new(format!("col_{i}"), DataType::Float32, true))
    .collect();
Arc::new(Schema::new(fields))

Array Creation with Iterators

rust

let array: Int64Array = (0..size)
    .map(|_| {
        if rng.random::<f32>() < null_density {
            None
        } else {
            Some(rng.random())
        }
    })
    .collect();

Testing Approaches

File-Based Tests

For code that acts on files, create transient test files:

rust

let mut temp_file = tempfile::NamedTempFile::new().unwrap();
// write test data
// run tests
// temp_file is automatically cleaned up

Benchmark Output

Provide both human-readable and CSV output:

rust

println!("CSV output:");
println!("{}", headers.join(","));
for result in &results {
    println!("{}", result.to_csv_row().join(","));
}

Anti-Patterns to Avoid

•Don't over-abstract - Three similar lines of code is better than a premature abstraction
•Don't add unnecessary error handling - Trust internal code and framework guarantees
•Don't add comments stating the obvious - Let the code speak for itself
•Don't create unused helpers - Only add utilities when they're needed more than once
•Don't guess at future requirements - Build for what's needed now

Domain-Specific Notes

This codebase focuses on:

•Apache Arrow: In-memory columnar data format
•Parquet: Columnar storage file format
•DataFusion: SQL query engine
•InfluxDB IOx: Time-series database
•Performance benchmarking: Measuring and comparing implementations

When working in this domain, prefer:

•Streaming/iterator patterns for large data
•Parallel processing with Tokio
•Zero-copy operations where possible
•Explicit type annotations for complex generic code

Code Like alamb: Rust Programming Style Guide

Project Structure & Organization

Module Organization

Workspace Projects

Error Handling

Custom Error Types

StringifyError Trait

Result Type Aliases

CLI & Configuration

Clap with Derive

Subcommand Pattern

Environment Variable Support

Async/Tokio Patterns

Main Function

JoinSet for Concurrent Tasks

Channel-based Task Coordination

FuturesUnordered for Concurrent Operations

Builder Pattern

Standard Builder with with_ Methods

Mutable Builder with &mut self Return

Display & Debug Traits

Custom Display for User-Facing Output

Custom Debug with debug_struct

Type Annotations

Explicit Closure Types

Enum Patterns

Simple Enums with Display

FromStr for CLI Parsing

Constants & Configuration

Module-Level Constants

Benchmarking & Timing

Timing Structures

Warm-up Before Measurement

RAII Timer Pattern

Macros

Helper Macros for Repetitive Patterns

Function Signatures

Flexible Input with impl Into<T>

Returning impl Trait

Code Style

Import Organization

Minimal Comments

Assertions for Programming Errors

Early Returns

Arrow-Specific Patterns

RecordBatch Processing

Schema Building

Array Creation with Iterators

Testing Approaches

File-Based Tests

Benchmark Output

Anti-Patterns to Avoid

Domain-Specific Notes

Standard Builder with `with_` Methods

Mutable Builder with `&mut self` Return

Custom Debug with `debug_struct`

Flexible Input with `impl Into<T>`

Returning `impl Trait`