AgentSkillsCN

update-clickhouse-m3u

在远程 M3 Ultra 机器(m3u)上更新 ClickHouse 数据库。当用户提出“更新 ClickHouse”“刷新 ClickHouse 数据”“同步 ClickHouse 数据”“更新 ClickHouse 中的交易数据”或需要将新交易日推送到 ClickHouse 数据库时,可使用此技能。该技能可自动建立与 m3u:22222 的 SSH 连接,并针对指定日期范围执行增量更新。

SKILL.md
--- frontmatter
name: update-clickhouse-m3u
description: Update ClickHouse database on the remote M3 Ultra machine (m3u). Use when user asks to "update clickhouse", "refresh clickhouse data", "sync clickhouse", "update trading data in clickhouse", or needs to push new trading days to the ClickHouse database. Handles SSH connection to m3u:22222 and runs incremental updates for specified date ranges.

Update ClickHouse on M3 Ultra

Update the ClickHouse transactions database on the remote M3 Ultra machine with trading data from df_trans pickle.

Quick Start

Run the update script on the remote:

bash
ssh -p 22222 sotola@m3u \
  "PYTHONHASHSEED=0 PYTHONPATH=/Users/sotola/PycharmProjects/remote_to_m3_ultra_from_m3:\$PYTHONPATH \
   /Users/sotola/anaconda3/bin/python -u \
   /Users/sotola/PycharmProjects/remote_to_m3_ultra_from_m3/gen_data/update_clickhouse_since_date.py \
   --start-date 2025_12_01"

Parameters

  • --start-date or -s: Start date in YYYY_MM_DD format (default: 2025_12_01)

What It Does

  1. Loads df_trans from /db23/pickles/df_trans_numerical.pickle on remote
  2. Filters days >= start_date
  3. For each day: drops partition, inserts fresh data (upsert mode)
  4. Generates summary report

Connection Details

  • Host: sotola@m3u
  • Port: 22222
  • Remote project: /Users/sotola/PycharmProjects/remote_to_m3_ultra_from_m3
  • ClickHouse: localhost on remote (default db, transactions table)

Architecture Overview

code
LOCAL (mac_local_m4)
│
├─► run_remote_on_tmux_v2.py
│   └─► Defines command: gen_clickhouse_data.py
│       └─► Creates tmux pane, calls lib/run_remote_with_report.py
│
├─► lib/run_remote_with_report.py
│   └─► RemoteCommandRunner class
│       └─► SSH to sotola@m3u:22222
│           └─► Executes Python command on remote
│
▼
REMOTE (m3u: ~/PycharmProjects/remote_to_m3_ultra_from_m3)
│
├─► gen_data/gen_clickhouse_data.py
│   └─► Imports DBWorkFlows from workflows/db_workflows.py
│   └─► Reads df_trans pickle file
│   └─► Calls workflows.update_days()
│
├─► workflows/db_workflows.py
│   └─► DbOperations class
│       └─► clickhouse_driver.Client(host='localhost', port=9000)
│
▼
DOCKER on m3u
│
└─► Container: clickhouse-server
    └─► Ports: 0.0.0.0:9000 → 9000 (native protocol)
              0.0.0.0:8123 → 8123 (HTTP)
    └─► Volume: clickhouse-data
    └─► Database: default
    └─► Table: transactions

The Update Script

Located at: /Users/sotola/PycharmProjects/remote_to_m3_ultra_from_m3/gen_data/update_clickhouse_since_date.py

python
"""[Created by Opus: abc0185a-ae8d-4ec7-b35f-db0859dd58fe]"""
from pathlib import Path
try:
    project_root = Path(__file__).resolve().parent.parent
except NameError:
    print("Running in an interactive session. "
          "Using current working directory as project root.")
    project_root = Path.cwd()

import sys
sys.path.insert(0, str(project_root))

from workflows.db_workflows import DBWorkFlows
from resources.adapters import Adapters
from resources.resources import DB_RESOURCES
from lib.utilities import U
import pandas as pd
import argparse


# =============================================================================
""" CONFIGURATION """
# =============================================================================

DEFAULT_START_DATE = '2025_12_01'


# =============================================================================
""" TERMINAL COLORS """
# =============================================================================

class Tc:
    PURP = '\x1b[95m'
    CYAN = '\x1b[96m'
    GRN = '\x1b[92m'
    YEL = '\x1b[93m'
    RED = '\x1b[91m'
    BOLD = '\x1b[1m'
    END = '\x1b[0m'
    GRAY = '\x1b[90m'


# =============================================================================
""" MAIN LOGIC """
# =============================================================================

def init():
    U.set_pd_display_options()
    pd.options.display.float_format = '{:.1f}'.format
    adapters = Adapters()
    df_trans = adapters.read_pickle(
        DB_RESOURCES.DF_TRANS_NUMERICAL,
        label='update_clickhouse_since_date.init')
    return adapters, df_trans


def main(start_date: str):
    label = f"{Tc.GRAY}update_clickhouse_since_date{Tc.END}.{Tc.CYAN}main{Tc.END}:"

    # Convert start_date to int format (YYYYMMDD)
    start_date_int = int(start_date.replace('_', ''))

    print(f"{label} {Tc.BOLD}Updating ClickHouse for days >= {start_date}{Tc.END}")
    print("=" * 80)

    # Initialize
    adapters, df_trans = init()

    # Get all days from df_trans
    all_days = sorted(df_trans['day'].unique().tolist())

    # Filter days >= start_date
    days_to_update = [d for d in all_days if d >= start_date_int]

    if not days_to_update:
        print(f"{label} {Tc.YEL}⚠{Tc.END} No days found >= {start_date}")
        return

    print(f"{label} Found {Tc.BOLD}{len(days_to_update)}{Tc.END} days to update")
    print(f"{label} Date range: {Tc.CYAN}{days_to_update[0]} to {days_to_update[-1]}{Tc.END}")

    # Format days for display
    formatted_days = []
    for d in days_to_update:
        d_str = str(d)
        formatted_days.append(f"{d_str[:4]}_{d_str[4:6]}_{d_str[6:8]}")

    print(f"{label} Days: {Tc.CYAN}{', '.join(formatted_days)}{Tc.END}")
    print()

    # Initialize workflows
    workflows = DBWorkFlows(adapters=adapters)

    # Update all days at once (upsert mode)
    workflows.update_days(days=days_to_update, df_trans=df_trans)

    # Generate report
    workflows.generate_report()

    print(f"\n{label} {Tc.GRN}✓ Update complete!{Tc.END}")


if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description='Update ClickHouse database for days since a specific date'
    )
    parser.add_argument(
        '--start-date', '-s',
        type=str,
        default=DEFAULT_START_DATE,
        help=f'Start date in YYYY_MM_DD format (default: {DEFAULT_START_DATE})'
    )

    args = parser.parse_args()
    main(args.start_date)

Alternative: Update Last N Days

Use the original script with MODE='incremental':

bash
ssh -p 22222 sotola@m3u \
  "PYTHONHASHSEED=0 PYTHONPATH=/Users/sotola/PycharmProjects/remote_to_m3_ultra_from_m3:\$PYTHONPATH \
   /Users/sotola/anaconda3/bin/python -u \
   /Users/sotola/PycharmProjects/remote_to_m3_ultra_from_m3/gen_data/gen_clickhouse_data.py"

This updates the last 5 days by default (configurable via NUM_RECENT_DAYS_TO_UPDATE).

Database Schema

Table: default.transactions, partitioned by day (YYYYMMDD int).

Key columns: id, stk, address, name, day, price, percentile, buy, buyOrder, sell, sellOrder, stock, group, type, insider.

<!-- [Edited by Claude: dd2ce000-7bcb-4dfa-a684-ef61f2c92ca7] -->