Web Reference Fetcher
Overview
This skill is designed to be called by other skills or workflows. It provides a three-step pipeline:
- •Fetch: Retrieve web content using jina.ai reader API
- •Analyze: Use subagents to extract specific topics and create detailed summaries
- •Save: Store the analyzed content as structured markdown files
When to Use This Skill
Use this skill when:
- •Another skill needs to fetch and analyze web documentation
- •You have URL(s) and need structured topic extraction
- •You need detailed summaries saved to specific file paths
- •Working with reference materials that require analysis and organization
Input Format
This skill expects:
- •URL(s): One or more web URLs to fetch
- •Experiment Context: Filename and entry ID to determine output directory
- •Output Path: Standardized path
workdir/<filename>_<entry_id>/references/ref_<N>.md
Directory naming convention:
- •Extract filename from JSONL path:
public_testfromdata/public_test.jsonl - •Combine with entry ID:
public_test_public_test_1 - •Create references subdirectory:
workdir/public_test_public_test_1/references/ - •Save each reference as:
ref_1.md,ref_2.md, etc.
Workflow
Step 1: Fetch Web Content
Use the provided script to fetch raw content from URL(s).
Execute:
python3 .claude/skills/web-reference-fetcher/scripts/fetch_url.py <url> --output workdir/<filename>_<entry_id>/references/ref_<N>.md
What it does:
- •Fetches content via
https://r.jina.ai/<url> - •Returns clean markdown content
- •Saves to standardized location
Script options:
- •
--output <path>: Save fetched content to file (required for standardized workflow) - •
--silent: Suppress progress messages
Standardized output location:
- •Reference files:
workdir/<filename>_<entry_id>/references/ref_<N>.md - •Example:
workdir/public_test_public_test_1/references/ref_1.md
Output: The script saves the fetched markdown content to the specified path.
Step 2: Analyze with Subagents
Pass the fetched content to a subagent along with extraction requirements.
Subagent invocation pattern:
Use the Task tool to launch a general-purpose subagent:
Prompt template:
以下のウェブコンテンツから、{トピック名}に関する詳細な情報を抽出してください。
コンテンツ:
---
{fetched_markdown_content}
---
抽出要件:
{extraction_requirements}
以下の形式でmarkdownとして出力してください:
# {トピック名}
## {セクション1}
[詳細な説明、テーブル、仕様など]
## {セクション2}
[詳細な説明、手順、コード例など]
...
Extraction requirements should specify:
- •What topics to extract
- •What level of detail is needed
- •What format to use (tables, lists, code blocks, etc.)
- •Any specific sections or keywords to focus on
Step 3: Save Structured Output
Save the subagent's output to the specified path.
File operations:
- •Create parent directories if they don't exist
- •Save with UTF-8 encoding
- •Verify file was written successfully
Complete Example Usage
Example 1: Fetch and Analyze a Single URL
# 1. Fetch content CONTENT=$(python3 .claude/skills/web-reference-fetcher/scripts/fetch_url.py \ "https://example.com/docs") # 2. Pass to subagent via Task tool with: # - Fetched content # - Topic: "API Authentication Methods" # - Extraction: "Extract all authentication methods, parameters, and examples" # 3. Save subagent output # Path: workdir/references/api_auth.md
Example 2: Multiple URLs for Different Topics
# Fetch URL 1 CONTENT1=$(python3 .claude/skills/web-reference-fetcher/scripts/fetch_url.py \ "https://example.com/spec") # Analyze with subagent for Topic 1 # Save to: workdir/references/specification.md # Fetch URL 2 CONTENT2=$(python3 .claude/skills/web-reference-fetcher/scripts/fetch_url.py \ "https://example.com/tutorial") # Analyze with subagent for Topic 2 # Save to: workdir/references/tutorial.md
Integration with Other Skills
This skill is designed to be called by other skills. Example integration:
# In another skill (e.g., test-case-handler skill):
## Step 3: Fetch Reference Documentation
For each reference URL in the test case:
1. Invoke web-reference-fetcher skill
2. Pass URL and topic extracted from test case instruction
3. Specify output path: `workdir/references/task_{N}/reference_{M}.md`
4. Use the saved markdown for further processing
Extraction Requirements Examples
For Technical Specifications
抽出要件: - すべての技術パラメータを表形式で抽出 - 測定手順をステップバイステップで記載 - 計算式と例を含める - 品質基準と許容範囲を明記
For API Documentation
抽出要件: - すべてのエンドポイントをリスト化 - リクエスト/レスポンス形式を示す - 認証方法を詳細に説明 - エラーコードと対処方法を含める
For Tutorials/Procedures
抽出要件: - 手順を番号付きリストで抽出 - 各手順の詳細な説明を含める - 必要なツールと前提条件を明記 - トラブルシューティング情報を追加
Error Handling
Fetch Errors
- •URL not accessible: Report error, suggest alternatives
- •Network issues: Check connectivity and jina.ai availability
- •Invalid URL format: Validate URL before attempting fetch
Subagent Errors
- •Content too large: Split into chunks and analyze separately
- •Unclear extraction requirements: Ask caller to specify details
- •Format issues: Validate subagent output before saving
File System Errors
- •Permission denied: Check write permissions
- •Path not found: Create parent directories automatically
- •Disk full: Report error with suggested cleanup
Resources
scripts/
- •fetch_url.py: Fetches web content via jina.ai
- •Input: URL string
- •Output: Clean markdown content to stdout
- •Options:
--outputto save raw content,--silentfor quiet mode
Advanced Usage
Batch Processing
For multiple URLs:
for url in "${urls[@]}"; do
# Fetch each URL
# Launch subagents in parallel if possible
# Save to respective paths
done
Custom Output Formatting
Provide detailed formatting instructions to subagents:
フォーマット要件: - 見出しレベル: H2を最上位とする - コードブロック: 言語を明示する - テーブル: Markdown形式、ヘッダー行を含む - リスト: 階層構造を保持する
Caching Fetched Content
To avoid re-fetching:
# Save raw content first python3 .claude/skills/web-reference-fetcher/scripts/fetch_url.py \ "https://example.com/docs" --output /tmp/cached_content.md # Use cached content for multiple analyses with different extraction requirements
Dependencies
- •Python 3.6+
- •
curlcommand-line tool - •Internet connectivity
- •Access to Task tool for subagent invocation
- •Write permissions to output directories
Quality Assurance
Before completing:
- • URL was fetched successfully
- • Subagent received correct content and extraction requirements
- • Output file is well-structured and comprehensive
- • File was saved to the correct path
- • Caller has been informed of the saved location
Communication Protocol
When called by other skills:
- •Receive: URL(s), extraction requirements, output path(s)
- •Execute: Fetch → Analyze → Save pipeline
- •Return: Success status and saved file path(s)
- •Report errors: Clear error messages with suggested fixes
Self-Contained Design
This skill is self-contained and can be invoked without knowledge of:
- •JSONL file structures
- •Test case formats
- •Specific project conventions
It only requires:
- •Valid URL(s)
- •Clear extraction requirements
- •Valid output path(s)