AgentSkillsCN

wayback-list

列出某个 URL 的 Wayback Machine 快照。当用户想要查看归档历史、浏览所有快照、查找旧版本,或浏览网页的归档副本时,可使用此功能。

SKILL.md
--- frontmatter
name: wayback-list
description: List Wayback Machine snapshots for a URL. Use when the user wants to see archive history, view all snapshots, find older versions, or browse archived copies of a webpage.

List Wayback Machine Snapshots

Retrieve a list of archived snapshots for a URL from the Wayback Machine CDX API.

Usage

bash
npx tsx scripts/list.ts <url> [limit] [options]

Arguments

ArgumentRequiredDescription
urlYesThe URL to search for
limitNoMax number of results (default: unlimited)

Options

OptionDescription
--no-rawInclude Wayback toolbar in URLs
--with-screenshotsCross-reference to show which captures have screenshots (📷)
--no-cacheBypass cache and fetch fresh data from API

Output

code
January 1, 2024 (3 days ago)
  https://web.archive.org/web/20240101120000id_/https://example.com

December 15, 2023 (20 days ago)
  https://web.archive.org/web/20231215100000id_/https://example.com

Total: 2 snapshot(s)

Script Execution (Preferred)

bash
npx tsx scripts/list.ts <url> [limit] [options]

Options:

  • --no-raw - Include Wayback toolbar in URLs
  • --with-screenshots - Cross-reference to show which captures have screenshots (📷)
  • --no-cache - Bypass cache and fetch fresh data from API

Run from the wayback plugin directory: ~/.claude/plugins/cache/wayback/

CDX API Endpoint

code
https://web.archive.org/cdx/search/cdx?url={URL}&output=json&limit={N}

Authentication (Optional)

Most CDX queries don't require authentication. For restricted data access:

bash
# Cookie-based auth for restricted content
curl "https://web.archive.org/cdx/search/cdx?url=..." \
  --cookie "cdx-auth-token=YOUR_TOKEN"

Get API keys at https://archive.org/account/s3.php

Parameters

ParameterDescription
urlThe URL to search for (required)
outputResponse format: json (recommended)
matchTypeexact (default), prefix, host, or domain
limitMax results. Use -N for last N results
offsetSkip first N records
fromStart date (YYYYMMDD or partial like "2020")
toEnd date (YYYYMMDD or partial)
filterField filter: [!]field:regex (e.g., statuscode:200, !mimetype:image.*)
collapseDedupe: field or field:N (e.g., timestamp:8 = daily)
flFields to return: comma-separated (urlkey, timestamp, original, mimetype, statuscode, digest, length)
fastLatesttrue for efficient recent results
showResumeKeytrue to get pagination token
resumeKeyContinue from previous query

How to List Snapshots

Use WebFetch to query the CDX API:

code
https://web.archive.org/cdx/search/cdx?url=https://example.com&output=json&limit=10

Response Format

JSON array where first row is headers:

json
[
  ["urlkey", "timestamp", "original", "mimetype", "statuscode", "digest", "length"],
  ["com,example)/", "20240101120000", "https://example.com/", "text/html", "200", "ABC123", "1234"]
]

Constructing Archive URLs

From timestamp, build the archived URL:

code
https://web.archive.org/web/{timestamp}/{original_url}

For raw content (no Wayback toolbar):

code
https://web.archive.org/web/{timestamp}id_/{original_url}

Common Queries

code
# Only successful pages
&filter=statuscode:200

# Exclude images
&filter=!mimetype:image.*

# One snapshot per day (collapse on first 8 digits of timestamp)
&collapse=timestamp:8

# One snapshot per hour
&collapse=timestamp:10

# Date range (partial dates work)
&from=2023&to=2024

# All pages under a path (prefix match)
&url=example.com/blog/&matchType=prefix

# Entire domain including subdomains
&url=example.com&matchType=domain

# Get last 5 snapshots efficiently
&limit=-5&fastLatest=true

# Paginate large results
&showResumeKey=true&limit=1000
# Then continue with: &resumeKey={token_from_previous}

Checking for Screenshots

The CDX API doesn't include a screenshot field. To find captures with screenshots, cross-reference with:

code
https://web.archive.org/cdx/search/cdx?url=web.archive.org/screenshot/{URL}/*&output=json

The --with-screenshots flag in the script does this automatically, showing 📷 next to captures that have screenshots.

Caching

CDX API responses are cached for 1 hour using the OS temporary directory (os.tmpdir()). Cache keys are generated from the URL and query parameters using SHA-256 hashing. Cached responses expire automatically and are deleted on access.

Use wayback-cache to manage cached data:

bash
npx tsx scripts/cache.ts clear    # Clear all cache
npx tsx scripts/cache.ts status   # Show cache status

See wayback-cache skill for complete cache management documentation.

Output Format (with --with-screenshots)

code
2024-01-15 12:34 (3 days ago) 📷
  https://web.archive.org/web/20240115123456id_/https://example.com/
  📷 https://web.archive.org/web/20240115123456im_/https://example.com/

2024-01-10 08:00 (8 days ago)
  https://web.archive.org/web/20240110080000id_/https://example.com/

Total: 2 snapshot(s)
Screenshots: 1 capture(s) have screenshots