AgentSkillsCN

altinity-expert-clickhouse-merges

诊断 ClickHouse 的合并性能、分区积压以及“分区过多”错误。适用于处理合并问题和分区管理难题。

SKILL.md
--- frontmatter
name: altinity-expert-clickhouse-merges
description: Diagnose ClickHouse merge performance, part backlog, and 'too many parts' errors. Use for merge issues and part management problems.

Merge Performance and Part Management

Diagnose merge performance, backlog issues, and part management problems.


Diagnostics

Run all queries from the file checks.sql and analyze the results.


Problem-Specific Investigation

"Too Many Parts" Error Investigation

For deep investigation of a specific table, use these ad-hoc queries:

sql
-- Check part creation rate (should be < 1/second)
select
    toStartOfMinute(event_time) as minute,
    count() as new_parts,
    round(avg(rows)) as avg_rows_per_part
from system.part_log
where event_type = 'NewPart'
  and database = '{database}'
  and table = '{table}'
  and event_time > now() - interval 1 hour
group by minute
order by minute desc
limit 30
sql
-- Check if merges are keeping up
select
    toStartOfMinute(event_time) as minute,
    countIf(event_type = 'NewPart') as new_parts,
    countIf(event_type = 'MergeParts') as merges,
    countIf(event_type = 'MergeParts') - countIf(event_type = 'NewPart') as net_reduction
from system.part_log
where database = '{database}'
  and table = '{table}'
  and event_time > now() - interval 1 hour
group by minute
order by minute desc
limit 30

If net_reduction is negative consistently → Inserts outpace merges. Solutions:

  • Increase batch size
  • Check max_parts_to_merge_at_once setting
  • Verify sufficient CPU for background merges

Ad-Hoc Query Guidelines

Required Safeguards

sql
-- Always include LIMIT
limit 100

-- Always time-bound historical queries
where event_date >= today() - 7

-- For part_log, always filter event_type
where event_type in ('NewPart', 'MergeParts', 'MutatePart')

Safe Exploration Patterns

sql
-- Discover available merge_reason values
select distinct merge_reason
from system.part_log
where event_type = 'MergeParts'
  and event_date = today()
limit 100

-- Check table engine
select
    database,
    name,
    engine,
    partition_key,
    sorting_key
from system.tables
where database = '{database}'
  and name = '{table}'

Avoid

  • select * from system.part_log → Huge, crashes context
  • Queries without time bounds on *_log tables
  • Joining large result sets in context (do aggregation in SQL)

Cross-Module Triggers

FindingLoad ModuleReason
Slow merges, normal diskaltinity-expert-clickhouse-schemaCheck ORDER BY, partitioning
Slow merges, high disk IOaltinity-expert-clickhouse-storageStorage bottleneck analysis
Merges blocked by mutationsaltinity-expert-clickhouse-mutationsStuck mutation investigation
High memory during mergesaltinity-expert-clickhouse-memoryMemory limits, settings
Replication lag + merge issuesaltinity-expert-clickhouse-replicationReplica queue analysis

Key Settings Reference

SettingDefaultImpact
max_parts_to_merge_at_once100Max parts in single merge
number_of_free_entries_in_pool_to_lower_max_size_of_merge8Throttles large merges when busy
background_pool_size16Merge threads
parts_to_throw_insert300Error threshold
parts_to_delay_insert150Delay threshold
max_bytes_to_merge_at_max_space_in_pool150GBMax merge size