AgentSkillsCN

bigquery-analytics

当用户提及“BigQuery”“分析查询”“数据仓库”“分区”“聚簇”“SQL 查询”或任何与 BigQuery 相关的工作时,可使用此技能。该技能将为您提供 BigQuery 表结构设计、查询优化以及 Node.js 集成的最佳实践。

SKILL.md
--- frontmatter
name: bigquery-analytics
description: Use this skill when the user asks about "BigQuery", "analytics queries", "data warehouse", "partitioning", "clustering", "SQL queries", or any BigQuery-related work. Provides BigQuery table design, query optimization, and Node.js integration patterns.

BigQuery Best Practices

Table Design

Partitioning (REQUIRED for tables > 1GB)

sql
CREATE TABLE `project.dataset.events` (
  event_id STRING,
  shop_id STRING,
  event_type STRING,
  created_at TIMESTAMP,
  data JSON
)
PARTITION BY DATE(created_at)
CLUSTER BY shop_id, event_type;
Data SizePartition By
< 1GBNot needed
1GB - 1TBDATE/TIMESTAMP
> 1TBDATE + consider sharding

Query Patterns

Always Use Partition Filter

sql
-- ❌ BAD: No partition filter (full scan)
SELECT * FROM events WHERE shop_id = 'shop_123';

-- ✅ GOOD: Partition filter included
SELECT * FROM events
WHERE created_at >= '2024-01-01'
  AND created_at < '2024-02-01'
  AND shop_id = 'shop_123';

Select Only Needed Columns

sql
-- ❌ BAD: SELECT *
SELECT * FROM events;

-- ✅ GOOD: Select specific columns
SELECT event_id, event_type, created_at FROM events;

Node.js Integration

Always Batch Inserts

javascript
// ✅ GOOD: Single batch insert
await table.insert(batch.map(row => ({
  ...row,
  time: new Date()
})));

// ❌ BAD: Insert one row at a time
for (const row of batch) {
  await table.insert([row]);
}
ScenarioMax Batch Size
Streaming inserts500-1000 rows
High throughputUp to 10,000 rows

Cost Control

javascript
// Dry run before expensive queries
const [job] = await bigquery.createQueryJob({
  query: sql,
  dryRun: true
});
const estimatedCost = (job.statistics.totalBytesProcessed / 1e12) * 5;

Checklist

code
□ Large tables (>1GB) have partitioning
□ Queries include partition column in WHERE
□ Tables clustered by frequently filtered columns
□ No SELECT * - select specific columns
□ Using parameterized queries
□ Batch inserts (not row-by-row)