ClickHouse Best Practices

Name: clickhouse-best-practices
Rating: 92
Author: ClickHouse

Comprehensive guidance for ClickHouse covering schema design, query optimization, and data ingestion. Contains 28 rules across 3 main categories (schema, query, insert), prioritized by impact.

Official docs: ClickHouse Best Practices

IMPORTANT: How to Apply This Skill

Before answering ClickHouse questions, follow this priority order:

•Check for applicable rules in the rules/ directory
•If rules exist: Apply them and cite them in your response using "Per rule-name..."
•If no rule exists: Use the LLM's ClickHouse knowledge or search documentation
•If uncertain: Use web search for current best practices
•Always cite your source: rule name, "general ClickHouse guidance", or URL

Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance.

For Formal Reviews

When performing a formal review of schemas, queries, or data ingestion:

Review Procedures

For Schema Reviews (CREATE TABLE, ALTER TABLE)

Read these rule files in order:

•rules/schema-pk-plan-before-creation.md - ORDER BY is immutable
•rules/schema-pk-cardinality-order.md - Column ordering in keys
•rules/schema-pk-prioritize-filters.md - Filter column inclusion
•rules/schema-types-native-types.md - Proper type selection
•rules/schema-types-minimize-bitwidth.md - Numeric type sizing
•rules/schema-types-lowcardinality.md - LowCardinality usage
•rules/schema-types-avoid-nullable.md - Nullable vs DEFAULT
•rules/schema-partition-low-cardinality.md - Partition count limits
•rules/schema-partition-lifecycle.md - Partitioning purpose

Check for:

• PRIMARY KEY / ORDER BY column order (low-to-high cardinality)
• Data types match actual data ranges
• LowCardinality applied to appropriate string columns
• Partition key cardinality bounded (100-1,000 values)
• ReplacingMergeTree has version column if used

For Query Reviews (SELECT, JOIN, aggregations)

Read these rule files:

•rules/query-join-choose-algorithm.md - Algorithm selection
•rules/query-join-filter-before.md - Pre-join filtering
•rules/query-join-use-any.md - ANY vs regular JOIN
•rules/query-index-skipping-indices.md - Secondary index usage
•rules/schema-pk-filter-on-orderby.md - Filter alignment with ORDER BY

Check for:

• Filters use ORDER BY prefix columns
• JOINs filter tables before joining (not after)
• Correct JOIN algorithm for table sizes
• Skipping indices for non-ORDER BY filter columns

For Insert Strategy Reviews (data ingestion, updates, deletes)

Read these rule files:

•rules/insert-batch-size.md - Batch sizing requirements
•rules/insert-mutation-avoid-update.md - UPDATE alternatives
•rules/insert-mutation-avoid-delete.md - DELETE alternatives
•rules/insert-async-small-batches.md - Async insert usage
•rules/insert-optimize-avoid-final.md - OPTIMIZE TABLE risks

Check for:

• Batch size 10K-100K rows per INSERT
• No ALTER TABLE UPDATE for frequent changes
• ReplacingMergeTree or CollapsingMergeTree for update patterns
• Async inserts enabled for high-frequency small batches

Output Format

Structure your response as follows:

code

## Rules Checked
- `rule-name-1` - Compliant / Violation found
- `rule-name-2` - Compliant / Violation found
...

## Findings

### Violations
- **`rule-name`**: Description of the issue
  - Current: [what the code does]
  - Required: [what it should do]
  - Fix: [specific correction]

### Compliant
- `rule-name`: Brief note on why it's correct

## Recommendations
[Prioritized list of changes, citing rules]

Rule Categories by Priority

Priority	Category	Impact	Prefix	Rule Count
1	Primary Key Selection	CRITICAL	`schema-pk-`	4
2	Data Type Selection	CRITICAL	`schema-types-`	5
3	JOIN Optimization	CRITICAL	`query-join-`	5
4	Insert Batching	CRITICAL	`insert-batch-`	1
5	Mutation Avoidance	CRITICAL	`insert-mutation-`	2
6	Partitioning Strategy	HIGH	`schema-partition-`	4
7	Skipping Indices	HIGH	`query-index-`	1
8	Materialized Views	HIGH	`query-mv-`	2
9	Async Inserts	HIGH	`insert-async-`	2
10	OPTIMIZE Avoidance	HIGH	`insert-optimize-`	1
11	JSON Usage	MEDIUM	`schema-json-`	1

Quick Reference

Schema Design - Primary Key (CRITICAL)

•schema-pk-plan-before-creation - Plan ORDER BY before table creation (immutable)
•schema-pk-cardinality-order - Order columns low-to-high cardinality
•schema-pk-prioritize-filters - Include frequently filtered columns
•schema-pk-filter-on-orderby - Query filters must use ORDER BY prefix

Schema Design - Data Types (CRITICAL)

•schema-types-native-types - Use native types, not String for everything
•schema-types-minimize-bitwidth - Use smallest numeric type that fits
•schema-types-lowcardinality - LowCardinality for <10K unique strings
•schema-types-enum - Enum for finite value sets with validation
•schema-types-avoid-nullable - Avoid Nullable; use DEFAULT instead

Schema Design - Partitioning (HIGH)

•schema-partition-low-cardinality - Keep partition count 100-1,000
•schema-partition-lifecycle - Use partitioning for data lifecycle, not queries
•schema-partition-query-tradeoffs - Understand partition pruning trade-offs
•schema-partition-start-without - Consider starting without partitioning

Schema Design - JSON (MEDIUM)

•schema-json-when-to-use - JSON for dynamic schemas; typed columns for known

Query Optimization - JOINs (CRITICAL)

•query-join-choose-algorithm - Select algorithm based on table sizes
•query-join-use-any - ANY JOIN when only one match needed
•query-join-filter-before - Filter tables before joining
•query-join-consider-alternatives - Dictionaries/denormalization vs JOIN
•query-join-null-handling - join_use_nulls=0 for default values

Query Optimization - Indices (HIGH)

•query-index-skipping-indices - Skipping indices for non-ORDER BY filters

Query Optimization - Materialized Views (HIGH)

•query-mv-incremental - Incremental MVs for real-time aggregations
•query-mv-refreshable - Refreshable MVs for complex joins

Insert Strategy - Batching (CRITICAL)

•insert-batch-size - Batch 10K-100K rows per INSERT

Insert Strategy - Async (HIGH)

•insert-async-small-batches - Async inserts for high-frequency small batches
•insert-format-native - Native format for best performance

Insert Strategy - Mutations (CRITICAL)

•insert-mutation-avoid-update - ReplacingMergeTree instead of ALTER UPDATE
•insert-mutation-avoid-delete - Lightweight DELETE or DROP PARTITION

Insert Strategy - Optimization (HIGH)

•insert-optimize-avoid-final - Let background merges work

When to Apply

This skill activates when you encounter:

•CREATE TABLE statements
•ALTER TABLE modifications
•ORDER BY or PRIMARY KEY discussions
•Data type selection questions
•Slow query troubleshooting
•JOIN optimization requests
•Data ingestion pipeline design
•Update/delete strategy questions
•ReplacingMergeTree or other specialized engine usage
•Partitioning strategy decisions

Rule File Structure

Each rule file in rules/ contains:

•YAML frontmatter: title, impact level, tags
•Brief explanation: Why this rule matters
•Incorrect example: Anti-pattern with explanation
•Correct example: Best practice with explanation
•Additional context: Trade-offs, when to apply, references

Full Compiled Document

For the complete guide with all rules expanded inline: AGENTS.md

Use AGENTS.md when you need to check multiple rules quickly without reading individual files.