AgentSkillsCN

pooch

为地球科学应用提供数据文件的下载与缓存功能。支持自动缓存、校验和验证以及多源下载,可轻松获取示例数据集。当Claude需要执行以下操作时使用:(1) 从URL或DOI下载数据集;(2) 在本地自动缓存文件并进行校验;(3) 通过SHA256/MD5哈希校验文件完整性;(4) 解压压缩文件(ZIP、TAR、GZIP);(5) 为可重复的工作流程创建数据注册表;(6) 从Zenodo或其他存储库中获取数据。

SKILL.md
--- frontmatter
name: pooch
description: |
  Data file fetching and caching for geoscience applications. Download sample
  datasets with automatic caching, checksum verification, and multiple download
  sources. Use when Claude needs to: (1) Download datasets from URLs or DOIs,
  (2) Cache files locally with automatic verification, (3) Verify file integrity
  with SHA256/MD5 hashes, (4) Extract compressed archives (ZIP, TAR, GZIP),
  (5) Create data registries for reproducible workflows, (6) Fetch from Zenodo
  or other repositories.

Pooch - Data File Fetching

Quick Reference

python
import pooch

# Download single file
file_path = pooch.retrieve(
    url="https://example.com/data.csv",
    known_hash="sha256:abc123...",  # None to skip verification
    fname="data.csv",
    path=pooch.os_cache("myproject")
)

# Create registry for multiple files
REGISTRY = pooch.create(
    path=pooch.os_cache("myproject"),
    base_url="https://example.com/data/",
    registry={"data.csv": "sha256:abc123...", "model.nc": "sha256:def456..."}
)
data_file = REGISTRY.fetch("data.csv")

# Generate hash for local file
file_hash = pooch.file_hash("/path/to/file.csv")

Key Functions

FunctionPurpose
pooch.retrieve()Download single file with caching
pooch.create()Create custom data registry
pooch.file_hash()Generate SHA256/MD5 hash of file
pooch.os_cache()Get OS-specific cache directory

Essential Operations

Download Files

python
# With hash verification
file_path = pooch.retrieve(
    url="https://example.com/data.nc",
    known_hash="sha256:abc123..."
)

# Without verification (development only)
file_path = pooch.retrieve(url="https://example.com/data.nc", known_hash=None)

# From Zenodo DOI
file_path = pooch.retrieve(
    url="doi:10.5281/zenodo.1234567/data.zip",
    known_hash="sha256:abc123..."
)

Extract Archives

python
# ZIP archive
files = pooch.retrieve(
    url="https://example.com/data.zip",
    known_hash="sha256:abc123...",
    processor=pooch.Unzip()
)

# Decompress single gzip file
file_path = pooch.retrieve(
    url="https://example.com/data.csv.gz",
    known_hash="sha256:abc123...",
    processor=pooch.Decompress(name="data.csv")
)

Additional Options

python
# Progress bar for large downloads
file_path = pooch.retrieve(url=url, known_hash=hash, progressbar=True)

# HTTP authentication
file_path = pooch.retrieve(
    url="https://example.com/protected/data.csv",
    known_hash=None,
    downloader=pooch.HTTPDownloader(auth=("user", "pass"))
)

Processor Options

ProcessorPurpose
Unzip()Extract ZIP archives
Untar()Extract TAR/TAR.GZ archives
Decompress()Decompress gzip, bz2, lzma, xz

Cache Locations

OSDefault Path
Linux~/.cache/<project>
macOS~/Library/Caches/<project>
WindowsC:\Users\<user>\AppData\Local\<project>\Cache

Error Handling

python
try:
    file_path = pooch.retrieve(url=url, known_hash=hash)
except pooch.exceptions.HTTPDownloadError:
    print("Download failed - check URL")
except pooch.exceptions.DownloadError:
    print("Network issue")

References

Scripts