BigQuery Data Export
Use this skill when exporting data from BigQuery to Cloud Storage or local files.
Basic Extract Command
bq extract \ --location=LOCATION \ --destination_format=FORMAT \ --compression=COMPRESSION \ PROJECT:DATASET.TABLE \ gs://bucket/file.ext
Extract to CSV
Basic CSV Export
bq extract \ --destination_format=CSV \ --print_header=true \ dataset.table \ gs://bucket/export.csv
CSV with Options
bq extract \ --destination_format=CSV \ --compression=GZIP \ --field_delimiter=',' \ --print_header=true \ dataset.table \ gs://bucket/export.csv.gz
CSV flags:
- •
--field_delimiter=','- Column separator - •
--print_header=true/false- Include header row
Tab-Delimited Export
bq extract \ --destination_format=CSV \ --field_delimiter=$'\t' \ dataset.table \ gs://bucket/export.tsv
Extract to JSON
Newline-Delimited JSON
bq extract \ --destination_format=NEWLINE_DELIMITED_JSON \ --compression=GZIP \ dataset.table \ gs://bucket/export.json.gz
Output format:
{"id": 1, "name": "Alice", "amount": 100.50}
{"id": 2, "name": "Bob", "amount": 250.75}
Extract to Avro
bq extract \ --destination_format=AVRO \ --compression=SNAPPY \ dataset.table \ gs://bucket/export.avro
Benefits:
- •Preserves schema
- •Efficient binary format
- •Fast re-import to BigQuery
Extract to Parquet
bq extract \ --destination_format=PARQUET \ dataset.table \ gs://bucket/export.parquet
Benefits:
- •Columnar format
- •Good compression
- •Compatible with many analytics tools
Compression Options
Available compression:
- •
GZIP- Good compression, slower (CSV, JSON, Avro) - •
SNAPPY- Fast, moderate compression (Avro, Parquet) - •
DEFLATE- Similar to GZIP (Avro) - •
NONE- No compression (fastest)
Example:
bq extract \ --destination_format=CSV \ --compression=GZIP \ dataset.table \ gs://bucket/export.csv.gz
Large Table Exports
Using Wildcards (>1 GB)
BigQuery limitation: 1 GB per file
Solution: Use wildcard in destination
bq extract \ --destination_format=CSV \ dataset.large_table \ 'gs://bucket/export-*.csv'
Output:
gs://bucket/export-000000000000.csv gs://bucket/export-000000000001.csv gs://bucket/export-000000000002.csv ...
Shard Pattern
# Create sharded exports bq extract \ --destination_format=AVRO \ dataset.large_table \ 'gs://bucket/shard/data-*.avro'
Note: Number of files depends on data size, not configurable.
Export Specific Partitions
Single Partition
# Export 2024-01-15 partition only bq extract \ --destination_format=CSV \ dataset.partitioned_table\$20240115 \ gs://bucket/export_20240115.csv
Date Range (use WHERE in EXPORT DATA)
See EXPORT DATA section below.
EXPORT DATA SQL Statement
Basic EXPORT DATA
EXPORT DATA OPTIONS( uri='gs://bucket/export-*.csv', format='CSV', overwrite=true, header=true, field_delimiter=',' ) AS SELECT * FROM `project.dataset.table` WHERE date >= '2024-01-01';
Export Query Results
EXPORT DATA OPTIONS( uri='gs://bucket/aggregated-*.parquet', format='PARQUET', overwrite=true ) AS SELECT customer_id, DATE(order_timestamp) as order_date, SUM(amount) as total_amount, COUNT(*) as order_count FROM `project.dataset.orders` WHERE DATE(order_timestamp) >= '2024-01-01' GROUP BY customer_id, order_date;
Format Options
CSV:
EXPORT DATA OPTIONS( uri='gs://bucket/*.csv', format='CSV', header=true, field_delimiter=',', compression='GZIP' ) AS SELECT ...;
JSON:
EXPORT DATA OPTIONS( uri='gs://bucket/*.json', format='JSON', compression='GZIP' ) AS SELECT ...;
Avro:
EXPORT DATA OPTIONS( uri='gs://bucket/*.avro', format='AVRO', compression='SNAPPY' ) AS SELECT ...;
Parquet:
EXPORT DATA OPTIONS( uri='gs://bucket/*.parquet', format='PARQUET' ) AS SELECT ...;
Export to Local Files (Not Recommended)
Small Results via Query
# For small datasets only bq query \ --format=csv \ --max_rows=10000 \ --use_legacy_sql=false \ 'SELECT * FROM `project.dataset.table` LIMIT 10000' \ > local_export.csv
Limitation: Not suitable for large datasets. Use GCS for production.
Export Scheduled (Automation)
Using Cloud Scheduler + EXPORT DATA
# Create scheduled query
bq mk --transfer_config \
--target_dataset=dataset \
--display_name='Daily Export' \
--schedule='every 24 hours' \
--params='{"query":"EXPORT DATA OPTIONS(uri='\''gs://bucket/daily-*.csv'\'', format='\''CSV'\'') AS SELECT * FROM dataset.table WHERE date = CURRENT_DATE()"}' \
--data_source=scheduled_query
Using Cloud Composer (Airflow)
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
export_task = BigQueryInsertJobOperator(
task_id='export_to_gcs',
configuration={
'extract': {
'sourceTable': {
'projectId': 'project',
'datasetId': 'dataset',
'tableId': 'table'
},
'destinationUris': ['gs://bucket/export-*.csv'],
'destinationFormat': 'CSV'
}
}
)
Monitoring Exports
Check Extract Jobs
bq ls --jobs --max_results=10
Job Details
bq show -j JOB_ID
Failed Exports
SELECT job_id, user_email, error_result.message as error_message, creation_time FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE job_type = 'EXTRACT' AND state = 'DONE' AND error_result IS NOT NULL AND creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR) ORDER BY creation_time DESC;
Export Best Practices
Format Selection
CSV:
- •✅ Human-readable
- •✅ Universal compatibility
- •❌ Larger file size
- •❌ No schema preservation
JSON:
- •✅ Human-readable
- •✅ Preserves nested structures
- •❌ Larger file size
Avro:
- •✅ Preserves schema
- •✅ Efficient binary format
- •✅ Fast BigQuery re-import
- •❌ Not human-readable
Parquet:
- •✅ Columnar format
- •✅ Good compression
- •✅ Analytics tool compatible
- •❌ Not human-readable
Compression Recommendations
For long-term storage: GZIP (best compression) For processing pipelines: SNAPPY (fast) For network transfer: GZIP (smaller size) For speed: NONE (no compression overhead)
Wildcards for Large Exports
Always use wildcards for:
- •Tables >500 MB
- •Unknown data size
- •Distributed processing
Example:
bq extract dataset.large_table 'gs://bucket/export-*.avro'
Cost Considerations
Export Costs
- •BigQuery extract: FREE
- •GCS storage: Standard GCS pricing
- •Network egress: Free within same region
Optimization
Reduce costs:
- •Export only needed columns (use EXPORT DATA with SELECT)
- •Filter rows before export (WHERE clause)
- •Use compression (smaller files)
- •Export to GCS in same region as BigQuery
Example - filtered export:
EXPORT DATA OPTIONS( uri='gs://bucket/*.parquet', format='PARQUET' ) AS SELECT customer_id, order_date, amount -- Only needed columns FROM `project.dataset.orders` WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) -- Last 30 days only AND amount > 0; -- Filter out zero amounts
Common Patterns
Daily Export
#!/bin/bash DATE=$(date +%Y%m%d) bq extract \ --destination_format=CSV \ --compression=GZIP \ dataset.table\$$DATE \ gs://bucket/exports/daily_export_$DATE.csv.gz
Incremental Export
-- Create temp table with new data CREATE TEMP TABLE new_data AS SELECT * FROM `project.dataset.table` WHERE updated_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR); -- Export only new data EXPORT DATA OPTIONS( uri='gs://bucket/incremental/data-*.parquet', format='PARQUET' ) AS SELECT * FROM new_data;
Export with Transformation
EXPORT DATA OPTIONS(
uri='gs://bucket/transformed-*.csv',
format='CSV'
) AS
SELECT
customer_id,
UPPER(customer_name) as customer_name,
ROUND(amount, 2) as amount,
FORMAT_DATE('%Y-%m-%d', order_date) as order_date
FROM `project.dataset.orders`
WHERE order_date >= '2024-01-01';
Troubleshooting
"Permission denied"
Problem: No write access to GCS bucket Solution: Grant BigQuery service account Storage Object Creator role
"Table too large"
Problem: Export exceeds 1GB without wildcard
Solution: Use wildcard pattern gs://bucket/export-*.csv
"Invalid URI"
Problem: Incorrect GCS path format
Solution: Use gs://bucket/path/file format, not https://
"Quota exceeded"
Problem: Too many extract jobs Solution: Batch exports or increase quota
Quick Reference
Format recommendations:
- •Re-import to BigQuery → Avro
- •Analytics tools → Parquet
- •Data exchange → CSV
- •API consumption → JSON
Compression guide:
- •Best ratio → GZIP
- •Fastest → SNAPPY or NONE
- •Balance → SNAPPY
Size limits:
- •1 GB per file (use wildcards)
- •10 TB per extract job
- •50,000 URIs per export
Syntax patterns:
# Single file gs://bucket/file.csv # Wildcard (recommended) 'gs://bucket/prefix-*.csv' # Sharded with path 'gs://bucket/path/to/shard-*.parquet'