Web Search Researcher

Activation

When this skill is triggered, ALWAYS display this banner first:

code

╭─────────────────────────────────────────────────────────────╮
│  🌐 SKILL ACTIVATED: web-search-researcher                  │
├─────────────────────────────────────────────────────────────┤
│  Topic: [research question/topic]                           │
│  Action: Searching web for authoritative sources...         │
│  Output: Synthesized findings with source links             │
╰─────────────────────────────────────────────────────────────╯

When to Use

This skill activates when:

•"search for information about"
•"find documentation on"
•"what's the best practice for"
•"look up how to"
•Need current/modern information not in training data
•Need official documentation or tutorials

Method 1: Exa.ai API (Primary - Recommended)

Exa provides semantic/neural search with content retrieval. Use this as the primary method.

Basic Search (get URLs and titles)

bash

curl -s "https://api.exa.ai/search" \
  -H "x-api-key: ${EXA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "your search query here",
    "numResults": 5,
    "type": "auto"
  }' | jq '.results[] | {title, url}'

Search with Content (get text from pages)

bash

curl -s "https://api.exa.ai/search" \
  -H "x-api-key: ${EXA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "your search query here",
    "numResults": 5,
    "type": "auto",
    "contents": {
      "text": {
        "maxCharacters": 1000
      }
    }
  }' | jq '.results[] | {title, url, text}'

Search with Highlights (best for extracting key info)

bash

curl -s "https://api.exa.ai/search" \
  -H "x-api-key: ${EXA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "your search query here",
    "numResults": 5,
    "type": "auto",
    "contents": {
      "highlights": {
        "numSentences": 3,
        "query": "specific aspect to highlight"
      }
    }
  }' | jq '.results[] | {title, url, highlights}'

Filter by Domain or Date

bash

curl -s "https://api.exa.ai/search" \
  -H "x-api-key: ${EXA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "kubernetes security best practices",
    "numResults": 5,
    "type": "auto",
    "includeDomains": ["kubernetes.io", "github.com"],
    "startPublishedDate": "2024-01-01T00:00:00.000Z",
    "contents": {
      "text": {"maxCharacters": 800}
    }
  }' | jq '.results[] | {title, url, publishedDate, text}'

Exa API Parameters Reference

Parameter	Type	Description
`query`	string	Search query (required)
`numResults`	int	Number of results (default: 10, max: 100)
`type`	string	`"auto"`, `"neural"`, or `"keyword"`
`includeDomains`	array	Limit to specific domains
`excludeDomains`	array	Exclude specific domains
`startPublishedDate`	string	ISO date filter (after)
`endPublishedDate`	string	ISO date filter (before)
`contents.text.maxCharacters`	int	Max chars of text to return
`contents.highlights.numSentences`	int	Number of highlight sentences
`contents.highlights.query`	string	Query for highlights

Method 2: Curl Fallback (When Exa fails or for direct fetching)

Use these methods if Exa API is unavailable or when you need to fetch specific URLs directly.

Fetch a webpage directly

bash

# Basic fetch
curl -sL "https://docs.python.org/3/library/asyncio.html" | head -500

# Follow redirects and get clean text (strip HTML)
curl -sL "https://example.com" | sed 's/<[^>]*>//g' | tr -s ' \n' | head -200

# With user agent (some sites require it)
curl -sL -A "Mozilla/5.0" "https://example.com"

Search via DuckDuckGo (no API key needed)

bash

# Get search results as HTML
curl -sL "https://html.duckduckgo.com/html/?q=python+asyncio+best+practices" | \
  grep -oP 'href="https?://[^"]+' | \
  grep -v duckduckgo | \
  head -10

Fetch GitHub content

bash

# Raw file from GitHub
curl -sL "https://raw.githubusercontent.com/owner/repo/main/README.md"

# GitHub API (for repo info, issues, etc.)
curl -sL "https://api.github.com/repos/astral-sh/uv" | head -50

Fetch PyPI package info

bash

curl -sL "https://pypi.org/pypi/requests/json" | jq '.info.version, .info.summary'

Fetch npm package info

bash

curl -sL "https://registry.npmjs.org/typescript" | jq '.["dist-tags"].latest, .description'

Search Strategies

For API/Library Documentation:

•Use Exa with domain filter: "includeDomains": ["docs.python.org", "developer.mozilla.org"]
•Fallback: Fetch official docs directly: curl -sL "https://docs.python.org/3/..."
•Check GitHub READMEs: curl -sL "https://raw.githubusercontent.com/..."

For Best Practices:

•Use Exa neural search for semantic matching
•Search for style guides and include domain filters for authoritative sources
•Check awesome-* lists on GitHub

For Technical Solutions:

•Use Exa with content retrieval to get actual answers
•Filter to Stack Overflow: "includeDomains": ["stackoverflow.com"]
•Check GitHub issues via API

For Comparisons:

•Search "X vs Y" with Exa and get highlights
•Fetch benchmark repositories on GitHub

Output Format

Structure your findings as:

code

## Summary
[Brief overview of key findings]

## Detailed Findings

### [Topic/Source 1]
**Source**: [URL]
**Key Information**:
- Direct quote or finding
- Another relevant point

### [Topic/Source 2]
[Continue pattern...]

## Additional Resources
- [URL 1] - Brief description
- [URL 2] - Brief description

## Gaps or Limitations
[Note any information that couldn't be found]

Quality Guidelines

•Accuracy: Always quote sources accurately and provide direct links
•Relevance: Focus on information that directly addresses the query
•Currency: Note publication dates from Exa results when available
•Authority: Prioritize official sources (docs, GitHub, official blogs)
•Transparency: Clearly indicate when information might be outdated

Useful URLs for Direct Research

Topic	URL Pattern
Python docs	`https://docs.python.org/3/library/{module}.html`
PyPI	`https://pypi.org/pypi/{package}/json`
npm	`https://registry.npmjs.org/{package}`
GitHub API	`https://api.github.com/repos/{owner}/{repo}`
MDN Web Docs	`https://developer.mozilla.org/en-US/docs/Web/{topic}`
Can I Use	`https://caniuse.com/?search={feature}`
Rust docs	`https://docs.rs/{crate}/latest/`
Go docs	`https://pkg.go.dev/{module}`

⚠️ Budget Limits (IMPORTANT)

Daily budget: $1.00 maximum

Cost Reference (approximate)

Operation	Cost
Basic search (5 results, no content)	~$0.005
Search with text content	~$0.007
Search with highlights	~$0.008

Budget Guidelines

•Max ~100-140 Exa searches per day with content
•Prefer fewer, targeted searches over many broad ones
•Use curl fallback for simple lookups (free) - e.g., fetching a known URL
•Check if direct URL fetch works first before using Exa search
•Batch related questions into single searches when possible

When to Use Exa vs Curl

Scenario	Use
Need semantic/intelligent search	Exa
Know the exact URL already	Curl (free)
Fetching GitHub/PyPI/npm info	Curl (free)
Simple keyword search	DuckDuckGo via curl (free)
Need page content from unknown sources	Exa with contents

web-search-researcher

Web Search Researcher

Activation

When to Use

Method 1: Exa.ai API (Primary - Recommended)

Basic Search (get URLs and titles)

Search with Content (get text from pages)

Search with Highlights (best for extracting key info)

Filter by Domain or Date

Exa API Parameters Reference

Method 2: Curl Fallback (When Exa fails or for direct fetching)

Fetch a webpage directly

Search via DuckDuckGo (no API key needed)

Fetch GitHub content

Fetch PyPI package info

Fetch npm package info

Search Strategies

For API/Library Documentation:

For Best Practices:

For Technical Solutions:

For Comparisons:

Output Format

Quality Guidelines

Useful URLs for Direct Research

⚠️ Budget Limits (IMPORTANT)

Cost Reference (approximate)

Budget Guidelines

When to Use Exa vs Curl

Troubleshooting

Exa API Errors

Fallback Order