AgentSkillsCN

fetch-text

从URL或base64 PDF中获取全部文本。支持集合感知(如果给定集合则提取第一个元素)。自动检测格式(PDF/HTML/MD/TXT),并提取完整文本内容

SKILL.md
--- frontmatter
name: fetch-text
type: python
description: "Fetch all text from URL or base64 PDF. Collection-aware (extracts first item if given Collection). Auto-detects format (PDF/HTML/MD/TXT) and extracts complete text content"

fetch-text

Fetch complete text content from URLs or PDFs. Auto-detects format and extracts all text.

Input

  • target: URL string, base64-encoded PDF, Note ID, or Collection ID (uses first item's content as URL)

Output

Success (status: "success"):

  • value: JSON string with:
    • text: Full extracted text
    • format: "pdf" | "html" | "markdown" | "text"
    • metadata: Source URL and format-specific metadata
    • page_count: Number of pages (PDF only)
    • char_count: Total character count

Failure (status: "failed"):

  • reason: Error description

Behavior

  • Auto-detects format from content
  • Extracts complete text without filtering
  • For Collections: extracts first Note's content field as URL

Planning Notes

  • Use when you have a specific URL and want complete content
  • Use search-web when searching for information (returns filtered excerpts)
  • For structured search results, extract URLs first with project

Examples

json
{"type":"fetch-text","target":"https://arxiv.org/pdf/1706.03762.pdf","out":"$paper_text"}
{"type":"project","target":"$papers","fields":["metadata.uri"],"out":"$urls"}
{"type":"fetch-text","target":"$urls","out":"$paper_text"}