AgentSkillsCN

doc-squeeze

面向 AI 代理的知识提取 API。支持以 Markdown 格式获取文档,提取结构化 JSON 数据,批量处理 URL,并在文档内进行高效搜索。由 Jina Reader + Groq LLM 提供支持——完全免费的免费 tier。

SKILL.md
--- frontmatter
name: doc-squeeze
description: "Knowledge extraction API for AI agents. Fetch docs as markdown, extract structured JSON, batch-process URLs, and search within documents. Powered by Jina Reader + Groq LLM — 100% free tier."

Doc-Squeeze — ClawHub Skill

Overview

Doc-Squeeze lets AI agents read external documentation and extract structured knowledge without a browser. It fetches any URL as clean markdown, extracts structured JSON from schema definitions, searches within documents, and processes multiple URLs in parallel.

Cost: $0 — Both Jina Reader and Groq have generous free tiers.

Tools

squeeze_url

PropertyValue
EndpointPOST /api/squeeze
AuthOptional (API key for higher limits)
Latency~2-5s (fetch) + ~1-3s (filter)

Input:

json
{
  "url": "https://docs.stripe.com/api/authentication",
  "focus": "Python code for setting the API key"
}
FieldTypeRequiredDescription
urlstringURL of the documentation page.
focusstringTopic filter — triggers LLM.

extract_structured

PropertyValue
EndpointPOST /api/extract
AuthOptional
Latency~3-8s

Input:

json
{
  "url": "https://docs.stripe.com/api",
  "schema_definition": {
    "endpoints": [{"method": "str", "path": "str", "description": "str"}],
    "auth_methods": ["str"]
  },
  "instructions": "Focus on the payments API only"
}
FieldTypeRequiredDescription
urlstringURL to extract from.
schema_definitionobjectJSON schema defining what to extract.
instructionsstringAdditional extraction guidance.

search_docs

PropertyValue
EndpointPOST /api/search
AuthOptional
Latency~3-8s

Input:

json
{
  "url": "https://docs.python.org/3/library/asyncio.html",
  "query": "How do I run multiple coroutines concurrently?",
  "max_results": 3
}
FieldTypeRequiredDescription
urlstringURL to search within.
querystringWhat to search for.
max_resultsintegerNumber of results (1-10, default 3).

batch_squeeze

PropertyValue
EndpointPOST /api/batch
AuthOptional
Latency~3-15s (parallel)

Input:

json
{
  "urls": [
    "https://docs.stripe.com/api/authentication",
    "https://docs.stripe.com/api/errors"
  ],
  "focus": "error handling"
}
FieldTypeRequiredDescription
urlsarrayURLs to fetch (max 10).
focusstringOptional topic filter for all URLs.

Self-Discovery

Agents can introspect the full tool schema at runtime:

code
GET /api/skill          → openclaw.json manifest
GET /.well-known/mcp.json → MCP server discovery
GET /.well-known/ai-plugin.json → OpenAI plugin manifest

Authentication

TierRate LimitHow to Get
Free5/minuteNo key needed
Dev60/minutePOST /api/keys/create
Pro300/minuteContact us

API keys are passed via X-API-Key header.

Permissions

PermissionHostRequiredReason
Network Accessr.jina.aiFetches docs as markdown.
Network Accessapi.groq.comLLM filtering (only with focus)

Environment

VariableRequiredHow to get it
GROQ_API_KEYNo*Free at console.groq.com/keys

*Without the key, only raw squeeze and batch work. Extract and search require Groq.

Agent Usage Example

python
import requests

BASE = "https://doc-squeeze.onrender.com"

# 1. Raw fetch
resp = requests.post(f"{BASE}/api/squeeze", json={
    "url": "https://docs.python.org/3/library/json.html"
})
docs = resp.json()["markdown"]

# 2. Structured extraction
resp = requests.post(f"{BASE}/api/extract", json={
    "url": "https://docs.stripe.com/api",
    "schema_definition": {
        "endpoints": [{"method": "str", "path": "str"}],
        "auth_type": "str"
    }
})
data = resp.json()["data"]

# 3. Deep search
resp = requests.post(f"{BASE}/api/search", json={
    "url": "https://docs.python.org/3/library/asyncio.html",
    "query": "How to cancel a task?"
})
answers = resp.json()["results"]

# 4. Batch fetch
resp = requests.post(f"{BASE}/api/batch", json={
    "urls": ["https://example.com", "https://httpbin.org/html"],
    "focus": "main content"
})
results = resp.json()["results"]