AgentSkillsCN

storage-layout

SignalDB的存储布局——包括WAL目录结构、Iceberg Catalog、对象存储路径、表类型、Segment生命周期,以及按数据集定制的存储配置。适用于处理WAL日志、Iceberg表、Parquet文件,或存储配置时使用。

SKILL.md
--- frontmatter
name: storage-layout
description: SignalDB storage layout - WAL directory structure, Iceberg catalog, object store paths, table types, segment lifecycle, and per-dataset storage overrides. Use when working with WAL, Iceberg tables, Parquet files, or storage configuration.
user-invocable: false

SignalDB Storage Layout Reference

Three-Tier Storage Model

code
WAL (local disk) -> Iceberg SQL Catalog (SQLite metadata) -> Object Store (Parquet data)

Object Store Layout

Path structure: {storage_base}/{tenant_slug}/{dataset_slug}/{table_name}/

code
.data/storage/
  acme/
    prod/
      traces/
        metadata/v1.metadata.json
        data/00000-0-{uuid}.parquet
      logs/
      metrics_gauge/
      metrics_sum/
      metrics_histogram/
    archive/
      traces/

Storage Backends

SchemeBackendExample
file://Local filesystemfile:///.data/storage
memory://In-memory (testing)memory://
s3://S3-compatibles3://bucket/prefix

Path resolution in src/common/src/storage.rs (storage_dsn_to_path()):

  • file:///.data/storage -> .data/storage
  • file:///tmp/data -> /tmp/data
  • s3://bucket/prefix -> kept as-is

Per-Dataset Storage Override

Datasets can override global storage:

toml
[[auth.tenants.datasets]]
id = "archive"
slug = "archive"
[auth.tenants.datasets.storage]
dsn = "s3://acme-archive/signals"

Resolution chain in Configuration::get_dataset_storage_config():

  1. Check dataset.storage -- if Some, use it
  2. Fall back to global config.storage

WAL Layout

Path: {wal_dir}/{tenant_id}/{dataset_id}/{signal_type}/

code
.data/wal/
  acme/
    production/
      traces/
        wal-0000000000.log    # Entry metadata (bincode)
        wal-0000000000.data   # Raw data (Arrow IPC StreamWriter)
        wal-0000000000.index  # Processed entry tracking (UUID list)

WAL Entry Structure

rust
pub struct WalEntry {
    pub id: Uuid,
    pub timestamp: u64,
    pub operation: WalOperation,    // WriteTraces | WriteLogs | WriteMetrics | Flush
    pub data_size: u64,
    pub data_offset: u64,
    pub processed: bool,
    pub tenant_id: String,
    pub dataset_id: String,
    pub metadata: Option<String>,   // JSON with schema_version, signal_type, target_table
}

WAL Config

rust
pub struct WalConfig {
    pub wal_dir: PathBuf,
    pub max_segment_size: u64,       // Default: 64 MB
    pub max_buffer_entries: usize,   // Default: 1000
    pub flush_interval_secs: u64,    // Default: 30s
    pub max_buffer_size: usize,      // Default: 128 MB
    pub tenant_id: String,           // Required, non-empty
    pub dataset_id: String,          // Required, non-empty
}

Segment Lifecycle

  1. Write: Append to current segment's .log and .data
  2. Rotation: When segment exceeds max_segment_size, create new segment
  3. Processing: WalProcessor reads unprocessed entries, writes to Iceberg, marks in .index
  4. Cleanup: Fully-processed segments deleted; partial segments compacted

Iceberg Catalog

  • SQLite-only SqlCatalog named "signaldb" (PostgreSQL not supported for Iceberg catalog)
  • Namespace: [tenant_slug, dataset_slug]
  • Tables created lazily on first write
  • Config: [schema] catalog_type = "sql", catalog_uri = "sqlite::memory:"

Table Types (up to 7 per tenant-dataset)

SignalTable NameSchema Source
Tracestracesschemas.toml (v2, inherits v1)
Logslogsschemas.toml (v1)
Metricsmetrics_gauge, metrics_sum, metrics_histogram, metrics_exponential_histogram, metrics_summaryiceberg_schemas.rs (hardcoded)

All tables partitioned by Hour(timestamp) as timestamp_hour.

Key Implementation Files

FilePurpose
schemas.tomlSchema definitions with versioning
src/common/src/iceberg/mod.rsIceberg catalog creation, object store builders
src/common/src/iceberg/schemas.rsSchema creation functions for traces/logs/metrics, partition specs
src/common/src/iceberg/names.rsNaming utilities for table identifiers, namespaces, locations
src/common/src/iceberg/table_manager.rsIcebergTableManager for table operations
src/common/src/schema/mod.rsSchema registry, re-exports iceberg modules
src/common/src/schema/schema_parser.rsTOML schema parser
src/common/src/catalog_manager.rsCatalogManager singleton
src/common/src/storage.rsObject store creation from DSN
src/common/src/wal/mod.rsWAL implementation
src/writer/src/storage/iceberg.rsIcebergTableWriter
src/writer/src/processor.rsWalProcessor
src/writer/src/schema_transform.rsv1->v2 schema transformation