Local ClickHouse
Install, configure, and validate local ClickHouse as an alternative to ClickHouse Cloud for development and backtesting (ADR-0044, ADR-0045).
Purpose
Enable local ClickHouse deployment for:
- •Backtesting: 50-100x faster queries (no network round-trip)
- •Development: Offline work without Cloud credentials
- •Evaluation: Try package before Cloud commitment
- •Cost savings: Avoid Cloud costs during experimentation
When to Use
Use this skill when:
- •Local development: Setting up development environment without Cloud
- •Backtesting optimization: Need faster query performance
- •Offline mode: Working without network access
- •Package evaluation: Trying gapless-crypto-clickhouse before Cloud setup
Triggers: User mentions "local ClickHouse", "install clickhouse", "backtesting setup", "offline mode", "GCCH_MODE=local"
Prerequisites
System Requirements:
- •macOS (Homebrew) or Linux (apt/installer)
- •2-4GB RAM minimum
- •10GB+ disk space for data
Network: Internet required for installation only (offline after setup)
Workflow
Step 1: Install ClickHouse
macOS (Homebrew):
# Install ClickHouse brew install clickhouse # Verify installation clickhouse --version
Linux (Ubuntu/Debian):
# Quick installer (recommended) curl https://clickhouse.com/ | sh ./clickhouse --version # Or via apt sudo apt-get install -y apt-transport-https ca-certificates curl -fsSL https://packages.clickhouse.com/deb/lts/clickhouse.gpg | sudo gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list sudo apt-get update sudo apt-get install -y clickhouse-server clickhouse-client
Step 2: Start ClickHouse Server
macOS (Homebrew):
# Start server in background clickhouse server --daemon # Verify server is running clickhouse client --query "SELECT 1"
Linux (systemd):
# Start and enable service sudo service clickhouse-server start sudo systemctl enable clickhouse-server # Verify server is running clickhouse-client --query "SELECT 1"
Step 3: Configure Environment
Set Local Mode:
# Explicit local mode (recommended) export GCCH_MODE=local # Or rely on auto-detection (localhost triggers local mode) export CLICKHOUSE_HOST=localhost
Environment Variables:
# Full local configuration export GCCH_MODE=local export CLICKHOUSE_HOST=localhost export CLICKHOUSE_HTTP_PORT=8123 export CLICKHOUSE_DATABASE=default export CLICKHOUSE_USER=default export CLICKHOUSE_PASSWORD=
Step 4: Verify Connection
Using Python:
from gapless_crypto_clickhouse import probe
# Check local ClickHouse status
status = probe.check_local_clickhouse()
print(f"Installed: {status['installed']}")
print(f"Running: {status['running']}")
print(f"Version: {status['version']}")
# Get current mode
mode = probe.get_current_mode()
print(f"Mode: {mode}") # Should be "local"
Using CLI:
# Quick connection test
GCCH_MODE=local python -c "
from gapless_crypto_clickhouse.clickhouse.config import ClickHouseConfig
config = ClickHouseConfig.from_env()
print(f'Host: {config.host}')
print(f'Port: {config.http_port}')
print(f'Secure: {config.secure}')
"
Step 5: Test with Real Data
Query with Auto-Ingestion:
import os
os.environ["GCCH_MODE"] = "local"
from gapless_crypto_clickhouse import query_ohlcv
# First query downloads and ingests data automatically
df = query_ohlcv(
"BTCUSDT",
"1h",
"2024-01-01",
"2024-01-07"
)
print(f"Rows: {len(df)}")
print(df.head())
Mode Detection Logic
The package auto-detects deployment mode based on environment:
GCCH_MODE=local → Local mode (explicit) GCCH_MODE=cloud → Cloud mode (requires CLICKHOUSE_HOST) GCCH_MODE=auto → Auto-detect: CLICKHOUSE_HOST="" → Local mode CLICKHOUSE_HOST="localhost" → Local mode CLICKHOUSE_HOST="127.0.0.1" → Local mode CLICKHOUSE_HOST="*.cloud" → Cloud mode
Introspection:
from gapless_crypto_clickhouse import probe # Get available modes modes = probe.get_deployment_modes() print(modes["available_modes"]) # ["local", "cloud"] # Get current mode current = probe.get_current_mode() # "local" or "cloud" # Get installation guide guide = probe.get_local_installation_guide() print(guide["macos"]["commands"])
Success Criteria
- •ClickHouse server running on localhost:8123
- •
probe.check_local_clickhouse()returns{"installed": True, "running": True} - •
probe.get_current_mode()returns"local" - •
query_ohlcv()executes without Cloud credentials
Troubleshooting
Issue: "Connection refused" on port 8123
- •
Check: Is ClickHouse server running?
bash# macOS ps aux | grep clickhouse # Linux sudo service clickhouse-server status
- •
Action: Start server with
clickhouse server --daemon
Issue: "Command not found: clickhouse"
- •Check: Was ClickHouse installed correctly?
- •macOS: Run
brew install clickhouse - •Linux: Run
curl https://clickhouse.com/ | sh
Issue: Mode detected as "cloud" instead of "local"
- •Check: Is
CLICKHOUSE_HOSTset to a remote hostname? - •Action: Set
export GCCH_MODE=localexplicitly
Issue: Server crashes with memory error
- •Check: System has 2-4GB RAM available
- •Action: Increase available memory or reduce concurrent queries
Issue: Permission denied errors
- •Check: ClickHouse data directory permissions
- •Action: Ensure user has write access to
/var/lib/clickhouse(Linux)
Port Reference
| Mode | HTTP Port | Native Port | Secure |
|---|---|---|---|
| Local | 8123 | 9000 | False |
| Cloud | 8443 | 9440 | True |
Performance Comparison
| Metric | Local | Cloud |
|---|---|---|
| Query latency | 50ms | 100-500ms |
| First query (cold) | 50ms | 5-10s (idle resume) |
| Network dependency | None | Required |
| Credentials | None | Required |
| Best for | Backtesting | Production |
Server Management
Start Server:
# macOS clickhouse server --daemon # Linux sudo service clickhouse-server start
Stop Server:
# macOS pkill -f clickhouse-server # Linux sudo service clickhouse-server stop
Check Logs:
# macOS (Homebrew) tail -f /opt/homebrew/var/log/clickhouse-server/clickhouse-server.log # Linux sudo tail -f /var/log/clickhouse-server/clickhouse-server.log
E2E Validation Workflow (ADR-0045)
This skill includes executable scripts for end-to-end validation of local ClickHouse.
Scripts Directory
| Script | Purpose |
|---|---|
scripts/start-clickhouse.sh | Start mise-installed ClickHouse server |
scripts/deploy-schema.sh | Deploy production schema (calls existing script) |
scripts/ingest-sample-data.py | Ingest real Binance data via query_ohlcv() |
scripts/take-screenshot.py | Capture Play UI screenshot via Playwright |
scripts/validate-data.py | Validate data integrity, output JSON evidence |
Running E2E Validation
Full workflow (manual):
# 1. Start ClickHouse ./skills/local-clickhouse/scripts/start-clickhouse.sh # 2. Deploy schema ./skills/local-clickhouse/scripts/deploy-schema.sh # 3. Ingest sample data uv run python skills/local-clickhouse/scripts/ingest-sample-data.py # 4. Capture screenshot uv run python skills/local-clickhouse/scripts/take-screenshot.py # 5. Validate data uv run python skills/local-clickhouse/scripts/validate-data.py
Via pytest (automated):
# Run E2E tests (requires mise ClickHouse installed) uv run pytest tests/test_local_clickhouse_e2e.py -v # Tests will FAIL (not skip) if ClickHouse not available
Evidence Output
Scripts output evidence to tests/screenshots/ (gitignored):
- •
play-ui-{timestamp}.png- Playwright screenshots - •
validation-{timestamp}.json- Structured validation results
References
- •ADR-0044: Local ClickHouse Option
- •ADR-0045: Local ClickHouse E2E Validation
- •Plan 0044: Implementation Plan
- •Plan 0045: E2E Validation Plan
- •llms.txt: AI agent documentation with deployment modes
- •probe.py:
get_deployment_modes(),check_local_clickhouse()
Next Steps
After successful local setup:
- •Schema Creation: Tables created automatically on first query
- •Data Ingestion: Use
query_ohlcv()withauto_ingest=True - •E2E Validation: Run
uv run pytest tests/test_local_clickhouse_e2e.py -v - •Backtesting: Run analysis with 50-100x faster queries
- •Production Migration: Switch to Cloud mode with
export GCCH_MODE=cloud