SourceMedium BigQuery Analyst
Use this skill to help end users work with SourceMedium BigQuery data from setup to analysis.
Workflow
- •Verify environment (run these before any analysis)
- •Confirm project and dataset/table visibility
- •Use docs-first guidance for definitions and table discovery
- •Answer analytical questions with reproducible SQL receipts
- •Call out assumptions and caveats explicitly
Setup Verification
Run these commands in order before writing analysis SQL:
bash
# 1. Check CLI tools are installed gcloud --version && bq version # 2. Check authenticated account gcloud auth list # 3. Check active project gcloud config get-value project # 4. Validate BigQuery API access (dry-run) bq query --use_legacy_sql=false --dry_run 'SELECT 1 AS ok' # 5. Test table access (your project is named sm-<tenant_id>) # Example: if your tenant is "acme-corp", your project is sm-acme-corp bq query --use_legacy_sql=false --dry_run " SELECT 1 FROM \`sm-<tenant_id>.sm_transformed_v2.obt_orders\` LIMIT 1 " # 6. Confirm you can actually read data (not just dry-run) bq query --use_legacy_sql=false " SELECT 1 FROM \`sm-<tenant_id>.sm_transformed_v2.obt_orders\` WHERE is_order_sm_valid = TRUE LIMIT 1 "
If any step fails, see references/TROUBLESHOOTING.md and guide the user to request access.
Safety Rules
These are hard constraints. Do not bypass.
Query Safety
- •SELECT-only — deny: INSERT, UPDATE, DELETE, MERGE, CREATE, DROP, EXPORT, COPY
- •Dry-run first when iterating on new queries:
bash
bq query --use_legacy_sql=false --dry_run '<SQL>'
- •Enforce cost limit with maximum bytes billed:
(1GB = 1073741824 bytes. If it fails due to bytes billed, tighten filters or ask for approval.)bash
bq query --use_legacy_sql=false --maximum_bytes_billed=1073741824 '<SQL>'
- •Always bound queries:
- •Add
LIMITclause (max 100 rows for exploratory) - •Use date/partition filters when querying partitioned tables
- •Prefer
WHEREfilters on partition columns
- •Add
Data Safety
- •Default to aggregates — avoid outputting raw rows unless explicitly requested
- •PII handling:
- •Do not output columns likely containing PII (email, phone, address, name) without explicit confirmation
- •If PII is requested, confirm scope and purpose before proceeding
- •Prefer anonymization. Example:
sql
-- Hash PII instead of exposing raw values SELECT TO_HEX(SHA256(LOWER(email))) AS email_hash, ...
Cost Guardrails
sql
-- Good: bounded scan SELECT ... FROM `sm-<tenant_id>.sm_transformed_v2.obt_orders` WHERE DATE(order_processed_at_local_datetime) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) LIMIT 100 -- Bad: full table scan SELECT ... FROM `sm-<tenant_id>.sm_transformed_v2.obt_orders` -- no filters
Output Contract
For analytical questions, always return:
- •Answer — concise plain-English conclusion
- •SQL (copy/paste) — BigQuery Standard SQL used for the result
- •Notes — timeframe, metric definitions, grain, scope, timezone, attribution lens
- •Verify —
bq query --use_legacy_sql=false --dry_run '<SQL>'command - •Bytes scanned — if >1GB, note this and ask for approval before running
If access/setup fails, do not fabricate results. Return:
- •Exact failing step
- •Exact project/dataset that failed
- •Direct user to
assets/BIGQUERY_ACCESS_REQUEST_TEMPLATE.md
Query Guardrails
- •Fully qualify tables as
`sm-<tenant_id>.dataset.table` - •For order analyses, default to
WHERE is_order_sm_valid = TRUE - •Use
sm_store_id(notsmcid— that name does not exist in customer tables) - •Use
SAFE_DIVIDEfor ratio math - •Handle DATE/TIMESTAMP typing explicitly (
DATE(ts_col)when comparing to dates) - •Use
order_net_revenuefor revenue metrics (notorder_gross_revenueunless explicitly asked) - •Prefer
*_local_datetimecolumns when available for date-based reporting; otherwise be explicit about UTC vs local - •For enumerations (channel, platform, status), discover values with
SELECT DISTINCTfirst, then use exact match. ReserveLIKE/REGEXPfor free-text fields (utm_campaign,product_title,page_path) - •LTV tables (
rpt_cohort_ltv_*): always filtersm_order_line_typeto exactly ONE value
References
- •
references/SCHEMA.md— key tables, grains, columns, and naming conventions - •
references/QUERY_PATTERNS.md— common SQL patterns and LTV/cohort rules - •
references/TROUBLESHOOTING.md— auth, permission, and API issues - •
assets/BIGQUERY_ACCESS_REQUEST_TEMPLATE.md— copy/paste request for users without access