AgentSkillsCN

Code Like Alamb

像 Alamb 一样写代码

SKILL.md

Code Like alamb: Rust Programming Style Guide

This document captures the coding patterns, design preferences, and stylistic conventions observed across alamb's Rust projects. These projects focus heavily on Apache Arrow, Parquet, DataFusion, and high-performance data processing.

Project Structure & Organization

Module Organization

  • Place mod declarations at the top of main.rs or lib.rs
  • Split modules into logical units by responsibility (e.g., error.rs, query.rs, replay.rs)
  • Use pub mod only when external visibility is needed
  • Keep main.rs focused on CLI parsing and orchestration; delegate logic to modules
rust
mod benchmark;
mod datagen;
mod file_type;
mod parquet_file;

use crate::benchmark::{MetadataParseBenchmark, MetadataParseResult};
use crate::file_type::FileType;

Workspace Projects

  • Use Cargo workspaces for related crates that need to be developed together
  • Useful when comparing different library versions (e.g., parquet9, parquetnext)

Error Handling

Custom Error Types

Prefer simple custom error types with Display and From implementations:

rust
use std::fmt::Display;

#[derive(Debug, Clone)]
pub struct Error {
    msg: String,
}

impl Display for Error {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "Error: {}", self.msg)
    }
}

pub type Result<T> = std::result::Result<T, Error>;

impl From<std::io::Error> for Error {
    fn from(e: std::io::Error) -> Self {
        Self { msg: format!("std::io::Error: {}", e) }
    }
}

impl From<String> for Error {
    fn from(msg: String) -> Self {
        Self { msg }
    }
}

StringifyError Trait

For quick prototyping or when structured errors are overkill:

rust
pub trait StringifyError<T> {
    fn stringify(self) -> Result<T, String>;
    fn context(self, msg: &str) -> Result<T, String>;
}

impl<T, E: std::fmt::Display> StringifyError<T> for Result<T, E> {
    fn stringify(self) -> Result<T, String> {
        self.map_err(|e| e.to_string())
    }

    fn context(self, msg: &str) -> Result<T, String> {
        self.map_err(|e| format!("{}: {}", msg, e))
    }
}

Result Type Aliases

Define module-level Result type aliases for cleaner signatures:

rust
pub type Result<T, E = String> = std::result::Result<T, E>;

CLI & Configuration

Clap with Derive

Use clap's derive macros for CLI parsing:

rust
use clap::Parser;

/// Command line program description goes here
#[derive(Parser, Debug)]
#[clap(author, version, about)]
struct Args {
    #[clap(long, parse(from_os_str))]
    /// Search path for files
    path: PathBuf,

    #[clap(long, default_value = "")]
    /// Optional filter
    filter: String,
}

Subcommand Pattern

Use enums for subcommands:

rust
#[derive(Parser, Debug)]
#[clap(author, version, about)]
enum MyTool {
    /// Dump raw entries
    DumpEntries(DumpEntries),
    /// Process calls
    DumpCalls(DumpCalls),
}

fn main() {
    let args = MyTool::parse();
    match args {
        MyTool::DumpEntries(dump) => { /* ... */ }
        MyTool::DumpCalls(dump) => { /* ... */ }
    }
}

Environment Variable Support

Support environment variables for configuration:

rust
#[structopt(
    short,
    long,
    global = true,
    env = "IOX_ADDR",
    default_value = "http://127.0.0.1:8082"
)]
host: String,

Async/Tokio Patterns

Main Function

Use #[tokio::main] with appropriate flavor:

rust
#[tokio::main(flavor = "multi_thread")]
async fn main() {
    // ...
}

JoinSet for Concurrent Tasks

Use JoinSet for managing multiple concurrent tasks:

rust
use tokio::task::JoinSet;

let mut join_set = JoinSet::new();
for item in items {
    join_set.spawn(async move {
        process(item).await
    });
}

let results = join_set.join_all().await;

Channel-based Task Coordination

Use mpsc channels for producer-consumer patterns:

rust
let (tx, mut rx) = tokio::sync::mpsc::channel(buffer_size);

// Producer task
join_set.spawn(async move {
    while let Some(item) = rx.recv().await {
        process(item);
    }
});

// Send work
tx.send(work).await?;

FuturesUnordered for Concurrent Operations

rust
use futures::{stream::FuturesUnordered, StreamExt};

let tasks = items
    .map(|item| tokio::task::spawn(process(item)))
    .collect::<FuturesUnordered<_>>();

let results = tasks.collect::<Vec<_>>().await;

Builder Pattern

Standard Builder with with_ Methods

Use consuming self pattern for builder methods:

rust
#[derive(Debug, Default)]
pub struct ConfigBuilder {
    path: Option<PathBuf>,
    columns: Option<usize>,
}

impl ConfigBuilder {
    pub fn new() -> Self {
        Default::default()
    }

    pub fn with_path(mut self, path: PathBuf) -> Self {
        self.path = Some(path);
        self
    }

    pub fn with_columns(mut self, columns: usize) -> Self {
        self.columns = Some(columns);
        self
    }

    pub fn build(self) -> Config {
        let Self { path, columns } = self;
        Config {
            path: path.expect("path is required"),
            columns: columns.expect("columns is required"),
        }
    }
}

Mutable Builder with &mut self Return

For builders that accumulate state incrementally:

rust
impl Call {
    pub fn with_timestamp(&mut self, timestamp: Option<DateTime<Utc>>) -> &mut Self {
        if let Some(ts) = timestamp {
            self.start_time = self.start_time.take()
                .map(|existing| existing.min(ts))
                .or(Some(ts));
        }
        self
    }
}

Display & Debug Traits

Custom Display for User-Facing Output

rust
impl Display for Config {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, " {:?} {} cols {} row groups", self.file_type, self.columns, self.row_groups)
    }
}

Custom Debug with debug_struct

rust
impl Debug for Benchmark {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.debug_struct("Benchmark")
            .field("file_path", &self.file_path)
            .field("file_len", &self.file_len)
            .field("footer_bytes", &self.footer_bytes.len())  // Show len, not contents
            .finish()
    }
}

Type Annotations

Explicit Closure Types

Keep type annotations in closures for clarity:

rust
let total_rows: usize = filters.iter().map(|f: &Filter| f.total_rows()).sum();

let results: Vec<Result> = items
    .iter()
    .filter_map(|item: &Item| -> Option<Result> {
        process(item)
    })
    .collect();

Enum Patterns

Simple Enums with Display

rust
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum FileType {
    Float,
    String,
}

impl Display for FileType {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            FileType::Float => write!(f, "Float32"),
            FileType::String => write!(f, "String"),
        }
    }
}

FromStr for CLI Parsing

rust
impl FromStr for CallFormat {
    type Err = String;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        match s {
            "pretty" => Ok(Self::Pretty),
            "bin" => Ok(Self::Binary),
            _ => Err("supported formats: {pretty, bin}".to_string()),
        }
    }
}

Constants & Configuration

Module-Level Constants

Use SCREAMING_SNAKE_CASE for constants at module level:

rust
const NUM_CLIENTS: usize = 20;
const NUM_REQUESTS: usize = 100;
const LINES_PER_REQUEST: usize = 10000;
const TEST_DURATION_SECS: u64 = 5;

Benchmarking & Timing

Timing Structures

rust
#[derive(Debug)]
pub struct Timing {
    num_runs: usize,
    total_duration: Duration,
}

impl Timing {
    pub fn avg_duration(&self) -> Duration {
        self.total_duration / self.num_runs as u32
    }
}

Warm-up Before Measurement

rust
// warm up with 10 runs
for _ in 0..10 {
    run_once();
}

// now run the actual benchmark
let mut total_duration = Duration::from_secs(0);
for _ in 0..self.num_runs {
    let duration = run_once();
    total_duration += duration;
}

RAII Timer Pattern

rust
struct RAAITimer {
    start: std::time::Instant,
}

impl Default for RAAITimer {
    fn default() -> Self {
        Self { start: Instant::now() }
    }
}

impl RAAITimer {
    fn done(self) -> String {
        format!("{:?}", self.start.elapsed())
    }
}

Macros

Helper Macros for Repetitive Patterns

rust
macro_rules! push_range {
    ($decoder:expr, $range:expr, $bytes:expr) => {
        $decoder
            .push_ranges(vec![$range.clone()], vec![$bytes.clone()])
            .unwrap();
    };
}

Function Signatures

Flexible Input with impl Into<T>

rust
pub fn new(description: impl Into<String>, file_path: PathBuf) -> Self {
    Self {
        description: description.into(),
        file_path,
        // ...
    }
}

pub fn try_new(file_name: impl Into<PathBuf>) -> Result<Self, String> {
    let file_name = file_name.into();
    // ...
}

Returning impl Trait

rust
pub fn header() -> impl Display {
    struct Header {}
    impl Display for Header {
        fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
            write!(f, "col1\tcol2\tcol3")
        }
    }
    Header {}
}

Code Style

Import Organization

  1. Standard library imports
  2. External crate imports
  3. Internal module imports (use crate::...)
rust
use std::fmt::{Display, Formatter};
use std::path::PathBuf;
use std::sync::Arc;

use arrow::array::BooleanArray;
use tokio::task::JoinSet;

use crate::benchmark::MetadataParseBenchmark;
use crate::file_type::FileType;

Minimal Comments

  • Code should be self-documenting
  • Use doc comments (///) for public APIs
  • Brief inline comments only when logic is non-obvious

Assertions for Programming Errors

Use assert! and panic! for conditions that indicate bugs:

rust
assert!(self.method_name.is_none(), "Already have method name: {:?}", self.method_name);

let DecodeResult::Data(_metadata) = decoder.try_decode().unwrap() else {
    panic!("Expected to be done with parsing");
};

Early Returns

Prefer early returns for error conditions:

rust
if filters.is_empty() {
    println!("No filters found");
    return;
}

Arrow-Specific Patterns

RecordBatch Processing

rust
let mut batches = vec![];
for maybe_record_batch in record_batch_reader {
    if batches.len() > 100 {
        println!("{}", pretty_format_batches(&batches)?);
        batches.clear();
    }
    batches.push(maybe_record_batch?);
}

Schema Building

rust
let fields: Vec<Field> = (0..columns)
    .map(|i| Field::new(format!("col_{i}"), DataType::Float32, true))
    .collect();
Arc::new(Schema::new(fields))

Array Creation with Iterators

rust
let array: Int64Array = (0..size)
    .map(|_| {
        if rng.random::<f32>() < null_density {
            None
        } else {
            Some(rng.random())
        }
    })
    .collect();

Testing Approaches

File-Based Tests

For code that acts on files, create transient test files:

rust
let mut temp_file = tempfile::NamedTempFile::new().unwrap();
// write test data
// run tests
// temp_file is automatically cleaned up

Benchmark Output

Provide both human-readable and CSV output:

rust
println!("CSV output:");
println!("{}", headers.join(","));
for result in &results {
    println!("{}", result.to_csv_row().join(","));
}

Anti-Patterns to Avoid

  1. Don't over-abstract - Three similar lines of code is better than a premature abstraction
  2. Don't add unnecessary error handling - Trust internal code and framework guarantees
  3. Don't add comments stating the obvious - Let the code speak for itself
  4. Don't create unused helpers - Only add utilities when they're needed more than once
  5. Don't guess at future requirements - Build for what's needed now

Domain-Specific Notes

This codebase focuses on:

  • Apache Arrow: In-memory columnar data format
  • Parquet: Columnar storage file format
  • DataFusion: SQL query engine
  • InfluxDB IOx: Time-series database
  • Performance benchmarking: Measuring and comparing implementations

When working in this domain, prefer:

  • Streaming/iterator patterns for large data
  • Parallel processing with Tokio
  • Zero-copy operations where possible
  • Explicit type annotations for complex generic code