Goldsky Dataset Discovery
Explore and discover available blockchain datasets for Turbo pipelines.
Triggers
Invoke this skill when the user:
- •Asks "what data does Goldsky have?" or "what chains are supported?"
- •Wants to know if a specific chain or dataset is available
- •Needs help finding the right dataset name format
- •Says "help me find a dataset" or "what datasets are available?"
- •Is unsure which dataset to use for their pipeline
- •Asks about ERC-20 transfers, NFT data, logs, or other blockchain data types
- •Mentions
/goldsky-datasets
Agent Instructions
IMPORTANT - Avoid
goldsky dataset list: This command is slow (30-60+ seconds) and often times out. Use the reference tables below instead. Only run the command if you need to verify a specific dataset exists or find an exact version.
When this skill is invoked:
Step 1: Use the Reference Tables First
Do NOT run goldsky dataset list upfront. Instead:
- •Check the Verified Dataset Reference table below for exact dataset names
- •Use
goldsky turbo validateon a test YAML to verify a dataset exists (fast, ~3 seconds) - •Only run
goldsky dataset list | grep "specific_name"if you need version info
Step 2: Match User Need to Dataset
Based on what they're building, recommend directly from the reference:
| User Need | Dataset | Version |
|---|---|---|
| Token transfers (ERC-20) | <chain>.erc20_transfers | 1.0.0 (eth), 1.2.0 (base) |
| NFT transfers (ERC-721) | <chain>.erc721_transfers | 1.0.0 |
| All transactions | <chain>.raw_transactions | 1.0.0 |
| Event logs | <chain>.logs or <chain>.raw_logs | 1.0.0 |
| Block data | <chain>.blocks or <chain>.raw_blocks | 1.0.0 |
Step 3: Validate Before Presenting
ALWAYS validate the dataset exists before giving it to the user:
# Quick validation test (fast, ~3 seconds)
goldsky turbo validate - <<'EOF'
name: test
resource_size: s
sources:
test:
type: dataset
dataset_name: <chain>.<dataset_type>
version: 1.0.0
start_at: latest
transforms: {}
sinks:
out:
type: blackhole
from: test
EOF
If validation fails with "Dataset not found", try alternate naming (see Verified Dataset Reference).
Dataset Reference Files
Detailed dataset and chain information is in the
data/folder.
| File | Contents |
|---|---|
verified-datasets.json | All validated datasets with versions, schemas, and use cases |
chain-prefixes.json | All chain prefixes, chain IDs, and common mistakes |
Data location: data/ (relative to this skill's directory)
Quick Reference
| Action | Command | Notes |
|---|---|---|
| Validate dataset | goldsky turbo validate file.yaml | Preferred - fast (3s) |
| Search for dataset | goldsky dataset list | grep "name" | Slow (30-60s), use sparingly |
| List all datasets | goldsky dataset list | Very slow - avoid |
Common Datasets
| What You Need | Dataset | Example |
|---|---|---|
| Token transfers (ERC-20) | <chain>.erc20_transfers | base.erc20_transfers (v1.2.0) |
| NFT transfers (ERC-721) | <chain>.erc721_transfers | ethereum.erc721_transfers (v1.0.0) |
| Transactions | <chain>.raw_transactions | ethereum.raw_transactions (v1.0.0) |
| Event logs | <chain>.logs | base.logs (v1.0.0) |
| Solana tokens | solana.token_transfers | v1.0.0 |
Important: Use
raw_transactions, NOTtransactions
Popular Chain Prefixes
| Chain | Prefix | Note |
|---|---|---|
| Ethereum | ethereum | |
| Base | base | |
| Polygon | matic | NOT polygon |
| Arbitrum | arbitrum | |
| Optimism | optimism | |
| BSC | bsc | |
| Avalanche | avalanche | |
| Solana | solana | Uses start_block not start_at |
See data/chain-prefixes.json for complete list with chain IDs.
Common Dataset Types
EVM Chains
| Dataset Type | Description | Use Case |
|---|---|---|
blocks | Block headers with metadata | Block explorers, timing analysis |
transactions | Transaction data | Wallet activity, gas analysis |
logs | Raw event logs | Custom event filtering |
traces | Internal transactions/calls | MEV analysis, contract interactions |
erc20_transfers | Fungible token transfers | Token tracking, DeFi analytics |
erc721_transfers | NFT transfers | NFT marketplaces, ownership tracking |
decoded_logs | ABI-decoded event logs | Specific contract events |
Solana
| Dataset Type | Description | Use Case |
|---|---|---|
token_transfers | SPL token transfers | Token tracking |
transactions | Transaction data | Wallet activity |
blocks | Block/slot data | Chain analysis |
Dataset Name Format
All datasets follow the pattern: <chain_prefix>.<dataset_type>
Examples:
- •
ethereum.erc20_transfers- ERC-20 transfers on Ethereum mainnet - •
base.logs- All event logs on Base - •
matic.blocks- Block data on Polygon - •
solana.token_transfers- SPL token transfers on Solana
Finding Dataset Versions
Datasets are versioned. To find available versions:
goldsky dataset list | grep "base.erc20"
Common versions:
- •
1.0.0- Initial version - •
1.2.0- Enhanced schema (common for ERC-20 transfers)
When in doubt, use the latest version shown in goldsky dataset list.
Common Discovery Patterns
"I want to track USDC transfers on Base"
- •Dataset:
base.erc20_transfers - •Filter by contract address in your pipeline transform:
transforms:
usdc_only:
type: sql
primary_key: id
sql: |
SELECT * FROM source_name
WHERE address = lower('0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913')
"I want all NFT activity on Ethereum"
Dataset: ethereum.erc721_transfers
"I want to monitor a specific smart contract"
- •Dataset:
<chain>.logsfor raw events, or<chain>.decoded_logsfor decoded events - •Filter by contract address in your transform
"I need multi-chain data"
Use multiple sources in your pipeline:
sources:
eth_transfers:
type: dataset
dataset_name: ethereum.erc20_transfers
version: 1.0.0
start_at: latest
base_transfers:
type: dataset
dataset_name: base.erc20_transfers
version: 1.2.0
start_at: latest
Troubleshooting
Dataset not found
Error: Source 'my_source' references unknown dataset 'invalid.dataset'
Fix:
- •Check the chain prefix is correct (e.g.,
maticnotpolygon) - •Check the dataset type exists (e.g.,
erc20_transfersnoterc20) - •Run
goldsky dataset listto see all available options
Chain not listed
If you can't find a chain in the tables above:
goldsky dataset list | grep -i "<chain_name>"
Some chains use non-obvious prefixes (e.g., Polygon uses matic).
Version mismatch
Error: Version '2.0.0' not found for dataset 'base.erc20_transfers'
Fix: Check available versions:
goldsky dataset list | grep "base.erc20_transfers"
Use a version that exists in the output.
Related Skills
- •
/turbo-pipelines- Create pipelines using discovered datasets - •
/goldsky-auth-setup- Set up CLI authentication first