ClickHouse Query Analyzer Skill
This skill provides a structured workflow for analyzing and optimizing ClickHouse queries. When a user provides a query for analysis, follow these steps:
Step 1: Schema Assessment
Review the table definition to understand the physical data layout.
- •Tools:
SHOW CREATE TABLE {table}orDESCRIBE TABLE {table}. - •Checklist:
- •Identify Partition Key: Is the query filtering by it?
- •Identify Sorting Key (Primary Key): Is the query leveraging the prefix of the sorting key in
WHERE/PREWHERE? - •Check Data Types: Are there
LowCardinalitystrings orNullablecolumns that could be optimized?
Step 2: Query Plan Analysis
Use ClickHouse's built-in EXPLAIN to see how the engine intends to execute the query.
- •Commands:
- •
EXPLAIN indexes=1 {query}: Check if any marks/parts are being skipped by indexes. - •
EXPLAIN actions=1 {query}: See the detailed execution steps. - •
EXPLAIN PIPELINE {query}: Check the level of parallelism.
- •
Step 3: Runtime Performance Audit
If the query has been run, analyze its actual resource consumption.
- •Source:
system.query_log. - •Key Metrics to Inspect:
- •
read_rowsvsresult_rows: High ratio indicates inefficient filtering. - •
read_bytes: Total I/O overhead. - •
memory_usage: Peak memory consumed (crucial for large JOINs or Aggregations). - •
query_duration_ms: Total latency.
- •
Step 4: Common Optimization Strategies
Apply these patterns to improve performance:
- •Leverage PREWHERE: Move filters on primary key columns or small columns to
PREWHEREto prune data before reading large columns. - •**Avoid SELECT ***: Specify only necessary columns to minimize I/O in the columnar storage.
- •Optimize Joins: ClickHouse prefers
JOINs where the right-side table fits in memory. Consider usingDictionariesfor high-performance lookups. - •Partition Pruning: Ensure filters on partition keys (usually time-based) are present to avoid scanning all data parts.
- •Function Pushdown: Avoid wrapping columns in functions in the
WHEREclause (e.g., usedate >= '2023-01-01'instead oftoYear(date) = 2023).
Analysis Report Format
When providing your analysis, structure it as follows:
- •Summary: High-level assessment (e.g., "I/O bound", "Memory intensive").
- •Schema Audit: Insights from table definition.
- •Execution Plan: Insights from
EXPLAIN. - •Bottlenecks: Specific causes of slowness.
- •Recommendations: Numbered list of actionable SQL changes or schema improvements.