AgentSkillsCN

goldsky-datasets

探索 Turbo 管道可用的区块链数据集。当您想要了解 Goldsky 提供哪些数据、寻找链前缀,或为管道选择合适的数据集时,此技能将大显身手。

SKILL.md
--- frontmatter
name: goldsky-datasets
description: Discover available blockchain datasets for Turbo pipelines. Use when exploring what data Goldsky offers, finding chain prefixes, or selecting the right dataset for a pipeline.

Goldsky Dataset Discovery

Explore and discover available blockchain datasets for Turbo pipelines.

Triggers

Invoke this skill when the user:

  • Asks "what data does Goldsky have?" or "what chains are supported?"
  • Wants to know if a specific chain or dataset is available
  • Needs help finding the right dataset name format
  • Says "help me find a dataset" or "what datasets are available?"
  • Is unsure which dataset to use for their pipeline
  • Asks about ERC-20 transfers, NFT data, logs, or other blockchain data types
  • Mentions /goldsky-datasets

Agent Instructions

IMPORTANT - Avoid goldsky dataset list: This command is slow (30-60+ seconds) and often times out. Use the reference tables below instead. Only run the command if you need to verify a specific dataset exists or find an exact version.

When this skill is invoked:

Step 1: Use the Reference Tables First

Do NOT run goldsky dataset list upfront. Instead:

  1. Check the Verified Dataset Reference table below for exact dataset names
  2. Use goldsky turbo validate on a test YAML to verify a dataset exists (fast, ~3 seconds)
  3. Only run goldsky dataset list | grep "specific_name" if you need version info

Step 2: Match User Need to Dataset

Based on what they're building, recommend directly from the reference:

User NeedDatasetVersion
Token transfers (ERC-20)<chain>.erc20_transfers1.0.0 (eth), 1.2.0 (base)
NFT transfers (ERC-721)<chain>.erc721_transfers1.0.0
All transactions<chain>.raw_transactions1.0.0
Event logs<chain>.logs or <chain>.raw_logs1.0.0
Block data<chain>.blocks or <chain>.raw_blocks1.0.0

Step 3: Validate Before Presenting

ALWAYS validate the dataset exists before giving it to the user:

bash
# Quick validation test (fast, ~3 seconds)
goldsky turbo validate - <<'EOF'
name: test
resource_size: s
sources:
  test:
    type: dataset
    dataset_name: <chain>.<dataset_type>
    version: 1.0.0
    start_at: latest
transforms: {}
sinks:
  out:
    type: blackhole
    from: test
EOF

If validation fails with "Dataset not found", try alternate naming (see Verified Dataset Reference).


Dataset Reference Files

Detailed dataset and chain information is in the data/ folder.

FileContents
verified-datasets.jsonAll validated datasets with versions, schemas, and use cases
chain-prefixes.jsonAll chain prefixes, chain IDs, and common mistakes

Data location: data/ (relative to this skill's directory)


Quick Reference

ActionCommandNotes
Validate datasetgoldsky turbo validate file.yamlPreferred - fast (3s)
Search for datasetgoldsky dataset list | grep "name"Slow (30-60s), use sparingly
List all datasetsgoldsky dataset listVery slow - avoid

Common Datasets

What You NeedDatasetExample
Token transfers (ERC-20)<chain>.erc20_transfersbase.erc20_transfers (v1.2.0)
NFT transfers (ERC-721)<chain>.erc721_transfersethereum.erc721_transfers (v1.0.0)
Transactions<chain>.raw_transactionsethereum.raw_transactions (v1.0.0)
Event logs<chain>.logsbase.logs (v1.0.0)
Solana tokenssolana.token_transfersv1.0.0

Important: Use raw_transactions, NOT transactions


Popular Chain Prefixes

ChainPrefixNote
Ethereumethereum
Basebase
PolygonmaticNOT polygon
Arbitrumarbitrum
Optimismoptimism
BSCbsc
Avalancheavalanche
SolanasolanaUses start_block not start_at

See data/chain-prefixes.json for complete list with chain IDs.


Common Dataset Types

EVM Chains

Dataset TypeDescriptionUse Case
blocksBlock headers with metadataBlock explorers, timing analysis
transactionsTransaction dataWallet activity, gas analysis
logsRaw event logsCustom event filtering
tracesInternal transactions/callsMEV analysis, contract interactions
erc20_transfersFungible token transfersToken tracking, DeFi analytics
erc721_transfersNFT transfersNFT marketplaces, ownership tracking
decoded_logsABI-decoded event logsSpecific contract events

Solana

Dataset TypeDescriptionUse Case
token_transfersSPL token transfersToken tracking
transactionsTransaction dataWallet activity
blocksBlock/slot dataChain analysis

Dataset Name Format

All datasets follow the pattern: <chain_prefix>.<dataset_type>

Examples:

  • ethereum.erc20_transfers - ERC-20 transfers on Ethereum mainnet
  • base.logs - All event logs on Base
  • matic.blocks - Block data on Polygon
  • solana.token_transfers - SPL token transfers on Solana

Finding Dataset Versions

Datasets are versioned. To find available versions:

bash
goldsky dataset list | grep "base.erc20"

Common versions:

  • 1.0.0 - Initial version
  • 1.2.0 - Enhanced schema (common for ERC-20 transfers)

When in doubt, use the latest version shown in goldsky dataset list.


Common Discovery Patterns

"I want to track USDC transfers on Base"

  1. Dataset: base.erc20_transfers
  2. Filter by contract address in your pipeline transform:
yaml
transforms:
  usdc_only:
    type: sql
    primary_key: id
    sql: |
      SELECT * FROM source_name
      WHERE address = lower('0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913')

"I want all NFT activity on Ethereum"

Dataset: ethereum.erc721_transfers

"I want to monitor a specific smart contract"

  1. Dataset: <chain>.logs for raw events, or <chain>.decoded_logs for decoded events
  2. Filter by contract address in your transform

"I need multi-chain data"

Use multiple sources in your pipeline:

yaml
sources:
  eth_transfers:
    type: dataset
    dataset_name: ethereum.erc20_transfers
    version: 1.0.0
    start_at: latest
  base_transfers:
    type: dataset
    dataset_name: base.erc20_transfers
    version: 1.2.0
    start_at: latest

Troubleshooting

Dataset not found

code
Error: Source 'my_source' references unknown dataset 'invalid.dataset'

Fix:

  1. Check the chain prefix is correct (e.g., matic not polygon)
  2. Check the dataset type exists (e.g., erc20_transfers not erc20)
  3. Run goldsky dataset list to see all available options

Chain not listed

If you can't find a chain in the tables above:

bash
goldsky dataset list | grep -i "<chain_name>"

Some chains use non-obvious prefixes (e.g., Polygon uses matic).

Version mismatch

code
Error: Version '2.0.0' not found for dataset 'base.erc20_transfers'

Fix: Check available versions:

bash
goldsky dataset list | grep "base.erc20_transfers"

Use a version that exists in the output.


Related Skills

  • /turbo-pipelines - Create pipelines using discovered datasets
  • /goldsky-auth-setup - Set up CLI authentication first