ClickHouse Best Practices
Comprehensive guidance for ClickHouse covering schema design, query optimization, and data ingestion. Contains 28 rules across 3 main categories (schema, query, insert), prioritized by impact.
Official docs: ClickHouse Best Practices
IMPORTANT: How to Apply This Skill
Before answering ClickHouse questions, follow this priority order:
- •Check for applicable rules in the
rules/directory - •If rules exist: Apply them and cite them in your response using "Per
rule-name..." - •If no rule exists: Use the LLM's ClickHouse knowledge or search documentation
- •If uncertain: Use web search for current best practices
- •Always cite your source: rule name, "general ClickHouse guidance", or URL
Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance.
For Formal Reviews
When performing a formal review of schemas, queries, or data ingestion:
Review Procedures
For Schema Reviews (CREATE TABLE, ALTER TABLE)
Read these rule files in order:
- •
rules/schema-pk-plan-before-creation.md- ORDER BY is immutable - •
rules/schema-pk-cardinality-order.md- Column ordering in keys - •
rules/schema-pk-prioritize-filters.md- Filter column inclusion - •
rules/schema-types-native-types.md- Proper type selection - •
rules/schema-types-minimize-bitwidth.md- Numeric type sizing - •
rules/schema-types-lowcardinality.md- LowCardinality usage - •
rules/schema-types-avoid-nullable.md- Nullable vs DEFAULT - •
rules/schema-partition-low-cardinality.md- Partition count limits - •
rules/schema-partition-lifecycle.md- Partitioning purpose
Check for:
- • PRIMARY KEY / ORDER BY column order (low-to-high cardinality)
- • Data types match actual data ranges
- • LowCardinality applied to appropriate string columns
- • Partition key cardinality bounded (100-1,000 values)
- • ReplacingMergeTree has version column if used
For Query Reviews (SELECT, JOIN, aggregations)
Read these rule files:
- •
rules/query-join-choose-algorithm.md- Algorithm selection - •
rules/query-join-filter-before.md- Pre-join filtering - •
rules/query-join-use-any.md- ANY vs regular JOIN - •
rules/query-index-skipping-indices.md- Secondary index usage - •
rules/schema-pk-filter-on-orderby.md- Filter alignment with ORDER BY
Check for:
- • Filters use ORDER BY prefix columns
- • JOINs filter tables before joining (not after)
- • Correct JOIN algorithm for table sizes
- • Skipping indices for non-ORDER BY filter columns
For Insert Strategy Reviews (data ingestion, updates, deletes)
Read these rule files:
- •
rules/insert-batch-size.md- Batch sizing requirements - •
rules/insert-mutation-avoid-update.md- UPDATE alternatives - •
rules/insert-mutation-avoid-delete.md- DELETE alternatives - •
rules/insert-async-small-batches.md- Async insert usage - •
rules/insert-optimize-avoid-final.md- OPTIMIZE TABLE risks
Check for:
- • Batch size 10K-100K rows per INSERT
- • No ALTER TABLE UPDATE for frequent changes
- • ReplacingMergeTree or CollapsingMergeTree for update patterns
- • Async inserts enabled for high-frequency small batches
Output Format
Structure your response as follows:
## Rules Checked - `rule-name-1` - Compliant / Violation found - `rule-name-2` - Compliant / Violation found ... ## Findings ### Violations - **`rule-name`**: Description of the issue - Current: [what the code does] - Required: [what it should do] - Fix: [specific correction] ### Compliant - `rule-name`: Brief note on why it's correct ## Recommendations [Prioritized list of changes, citing rules]
Rule Categories by Priority
| Priority | Category | Impact | Prefix | Rule Count |
|---|---|---|---|---|
| 1 | Primary Key Selection | CRITICAL | schema-pk- | 4 |
| 2 | Data Type Selection | CRITICAL | schema-types- | 5 |
| 3 | JOIN Optimization | CRITICAL | query-join- | 5 |
| 4 | Insert Batching | CRITICAL | insert-batch- | 1 |
| 5 | Mutation Avoidance | CRITICAL | insert-mutation- | 2 |
| 6 | Partitioning Strategy | HIGH | schema-partition- | 4 |
| 7 | Skipping Indices | HIGH | query-index- | 1 |
| 8 | Materialized Views | HIGH | query-mv- | 2 |
| 9 | Async Inserts | HIGH | insert-async- | 2 |
| 10 | OPTIMIZE Avoidance | HIGH | insert-optimize- | 1 |
| 11 | JSON Usage | MEDIUM | schema-json- | 1 |
Quick Reference
Schema Design - Primary Key (CRITICAL)
- •
schema-pk-plan-before-creation- Plan ORDER BY before table creation (immutable) - •
schema-pk-cardinality-order- Order columns low-to-high cardinality - •
schema-pk-prioritize-filters- Include frequently filtered columns - •
schema-pk-filter-on-orderby- Query filters must use ORDER BY prefix
Schema Design - Data Types (CRITICAL)
- •
schema-types-native-types- Use native types, not String for everything - •
schema-types-minimize-bitwidth- Use smallest numeric type that fits - •
schema-types-lowcardinality- LowCardinality for <10K unique strings - •
schema-types-enum- Enum for finite value sets with validation - •
schema-types-avoid-nullable- Avoid Nullable; use DEFAULT instead
Schema Design - Partitioning (HIGH)
- •
schema-partition-low-cardinality- Keep partition count 100-1,000 - •
schema-partition-lifecycle- Use partitioning for data lifecycle, not queries - •
schema-partition-query-tradeoffs- Understand partition pruning trade-offs - •
schema-partition-start-without- Consider starting without partitioning
Schema Design - JSON (MEDIUM)
- •
schema-json-when-to-use- JSON for dynamic schemas; typed columns for known
Query Optimization - JOINs (CRITICAL)
- •
query-join-choose-algorithm- Select algorithm based on table sizes - •
query-join-use-any- ANY JOIN when only one match needed - •
query-join-filter-before- Filter tables before joining - •
query-join-consider-alternatives- Dictionaries/denormalization vs JOIN - •
query-join-null-handling- join_use_nulls=0 for default values
Query Optimization - Indices (HIGH)
- •
query-index-skipping-indices- Skipping indices for non-ORDER BY filters
Query Optimization - Materialized Views (HIGH)
- •
query-mv-incremental- Incremental MVs for real-time aggregations - •
query-mv-refreshable- Refreshable MVs for complex joins
Insert Strategy - Batching (CRITICAL)
- •
insert-batch-size- Batch 10K-100K rows per INSERT
Insert Strategy - Async (HIGH)
- •
insert-async-small-batches- Async inserts for high-frequency small batches - •
insert-format-native- Native format for best performance
Insert Strategy - Mutations (CRITICAL)
- •
insert-mutation-avoid-update- ReplacingMergeTree instead of ALTER UPDATE - •
insert-mutation-avoid-delete- Lightweight DELETE or DROP PARTITION
Insert Strategy - Optimization (HIGH)
- •
insert-optimize-avoid-final- Let background merges work
When to Apply
This skill activates when you encounter:
- •
CREATE TABLEstatements - •
ALTER TABLEmodifications - •
ORDER BYorPRIMARY KEYdiscussions - •Data type selection questions
- •Slow query troubleshooting
- •JOIN optimization requests
- •Data ingestion pipeline design
- •Update/delete strategy questions
- •ReplacingMergeTree or other specialized engine usage
- •Partitioning strategy decisions
Rule File Structure
Each rule file in rules/ contains:
- •YAML frontmatter: title, impact level, tags
- •Brief explanation: Why this rule matters
- •Incorrect example: Anti-pattern with explanation
- •Correct example: Best practice with explanation
- •Additional context: Trade-offs, when to apply, references
Full Compiled Document
For the complete guide with all rules expanded inline: AGENTS.md
Use AGENTS.md when you need to check multiple rules quickly without reading individual files.