AgentSkillsCN

analyzing-data

查询数据仓库,解答关于数据的业务问题。无论是“谁在使用X?”、“Y的数量是多少?”、“给我看看Z”、“查找客户”、“统计数量是多少?”等需要数据库/仓库查询的问题,还是数据查询、指标分析、趋势研判,亦或是SQL分析,此技能都能轻松应对。

SKILL.md
--- frontmatter
name: analyzing-data
description: Queries data warehouse and answers business questions about data. Handles questions requiring database/warehouse queries including "who uses X", "how many Y", "show me Z", "find customers", "what is the count", data lookups, metrics, trends, or SQL analysis.
hooks:
  PreToolUse:
    - matcher: "Bash"
      hooks:
        - type: command
          command: "uv run ./scripts/cli.py ensure"
          once: true
  Stop:
    - hooks:
        - type: command
          command: "uv run ./scripts/cli.py stop"

Data Analysis

Answer business questions by querying the data warehouse. The kernel starts automatically on first use.

Prerequisites

uv must be installed:

bash
curl -LsSf https://astral.sh/uv/install.sh | sh

Scripts are located relative to this skill file.

MANDATORY FIRST STEP

Before any other action, check for cached patterns:

bash
uv run scripts/cli.py pattern lookup "<user's question>"

This is NON-NEGOTIABLE. Patterns contain proven strategies that save time and avoid failed queries.


Workflow

code
Analysis Progress:
- [ ] Step 1: pattern lookup (check for cached strategy)
- [ ] Step 2: concept lookup (check for known tables)
- [ ] Step 3: Search codebase for table definitions (Grep)
- [ ] Step 4: Read SQL file to get table/column names
- [ ] Step 5: Execute query via kernel (run_sql)
- [ ] Step 6: learn_concept (ALWAYS before presenting results)
- [ ] Step 7: learn_pattern (ALWAYS if discovery required)
- [ ] Step 8: record_pattern_outcome (if you used a pattern in Step 1)
- [ ] Step 9: Present findings to user

CLI Commands

Kernel Management

bash
uv run scripts/cli.py warehouse list  # List available warehouses
uv run scripts/cli.py start           # Start kernel with default warehouse
uv run scripts/cli.py start -w my_pg  # Start with specific warehouse
uv run scripts/cli.py exec "..."      # Execute Python code
uv run scripts/cli.py status          # Check kernel status
uv run scripts/cli.py restart         # Restart kernel
uv run scripts/cli.py stop            # Stop kernel
uv run scripts/cli.py install plotly  # Install additional packages

Concept Cache (concept -> table mappings)

bash
# Look up a concept
uv run scripts/cli.py concept lookup customers

# Learn a new concept
uv run scripts/cli.py concept learn customers HQ.MART_CUST.CURRENT_ASTRO_CUSTS -k ACCT_ID

# List all concepts
uv run scripts/cli.py concept list

# Import concepts from warehouse.md
uv run scripts/cli.py concept import -p /path/to/warehouse.md

Pattern Cache (query strategies)

bash
# Look up patterns for a question
uv run scripts/cli.py pattern lookup "who uses operator X"

# Learn a new pattern
uv run scripts/cli.py pattern learn operator_usage \
    -q "who uses X operator" \
    -q "which customers use X" \
    -s "1. Query TASK_RUNS for operator_class" \
    -s "2. Join with ORGS on org_id" \
    -t "HQ.MODEL_ASTRO.TASK_RUNS" \
    -t "HQ.MODEL_ASTRO.ORGANIZATIONS" \
    -g "TASK_RUNS is huge - always filter by date"

# Record pattern outcome
uv run scripts/cli.py pattern record operator_usage --success

# List all patterns
uv run scripts/cli.py pattern list

# Delete a pattern
uv run scripts/cli.py pattern delete operator_usage

Table Schema Cache

bash
# Look up cached table schema
uv run scripts/cli.py table lookup HQ.MART_CUST.CURRENT_ASTRO_CUSTS

# Cache a table schema
uv run scripts/cli.py table cache DB.SCHEMA.TABLE -c '[{"name":"id","type":"INT"}]'

# List all cached tables
uv run scripts/cli.py table list

# Delete from cache
uv run scripts/cli.py table delete DB.SCHEMA.TABLE

Cache Management

bash
# View cache statistics
uv run scripts/cli.py cache status

# Clear all caches
uv run scripts/cli.py cache clear

# Clear only stale entries (older than 90 days)
uv run scripts/cli.py cache clear --stale-only

Quick Start Example

bash
# 1. Check for existing patterns
uv run scripts/cli.py pattern lookup "how many customers"

# 2. Check for known concepts
uv run scripts/cli.py concept lookup customers

# 3. Execute query
uv run scripts/cli.py exec "df = run_sql('SELECT COUNT(*) FROM HQ.MART_CUST.CURRENT_ASTRO_CUSTS')"
uv run scripts/cli.py exec "print(df)"

# 4. Cache what we learned
uv run scripts/cli.py concept learn customers HQ.MART_CUST.CURRENT_ASTRO_CUSTS -k ACCT_ID

Available Functions in Kernel

Once kernel starts, these are available:

FunctionDescription
run_sql(query, limit=100)Execute SQL, return Polars DataFrame
run_sql_pandas(query, limit=100)Execute SQL, return Pandas DataFrame
plPolars library (imported)
pdPandas library (imported)

Table Discovery via Codebase

If concept/pattern cache miss, search the codebase:

code
Grep pattern="<concept>" glob="**/*.sql"
Repo TypeWhere to Look
Gustydags/declarative/04_metric/, 06_reporting/, 05_mart/
dbtmodels/marts/, models/staging/

Known Tables Quick Reference

ConceptTableKey ColumnDate Column
customersHQ.MART_CUST.CURRENT_ASTRO_CUSTSACCT_ID-
organizationsHQ.MODEL_ASTRO.ORGANIZATIONSORG_IDCREATED_TS
deploymentsHQ.MODEL_ASTRO.DEPLOYMENTSDEPLOYMENT_IDCREATED_TS
task_runsHQ.MODEL_ASTRO.TASK_RUNS-START_TS
dag_runsHQ.MODEL_ASTRO.DAG_RUNS-START_TS
usersHQ.MODEL_ASTRO.USERSUSER_ID-
accountsHQ.MODEL_CRM.SF_ACCOUNTSACCT_ID-

Large tables (always filter by date): TASK_RUNS (6B rows), DAG_RUNS (500M rows)


Query Tips

  • Use LIMIT during exploration
  • Filter early with WHERE clauses
  • Prefer pre-aggregated tables (METRICS_*, MART_*, AGG_*)
  • For 100M+ row tables: no JOINs or GROUP BY on first query

SQL Dialect Differences:

OperationSnowflakePostgreSQLBigQuery
Date subtractDATEADD(day, -7, x)x - INTERVAL '7 days'DATE_SUB(x, INTERVAL 7 DAY)
Case-insensitiveILIKEILIKELOWER(x) LIKE LOWER(y)

Reference