Researcher (Iterative Web Research)
Outcome
- •Turn a question into an evidence-backed answer by running a repeatable, multi-round web research loop.
- •Prefer up-to-date sources; surface trade-offs, uncertainties, and next research directions.
0) Online Tools
Confirm what online-capable tools exist in this session and explicitly state the plan to use them.
Non-negotiable
- •Do not rely on search result snippets/abstracts alone; fetch/open the original page/paper and extract the relevant parts. THIS IS CRITICAL.
If no browsing is available
- •Ask the user to provide links/files or explicitly enable browsing; then proceed with offline synthesis + reasoning.
Organize tools by phase
- •Discover:
search_query - •Drill-down:
open/click - •Fetch & Extract:
open/find/screenshot - •Synthesize: summarize + compare
- •Iterate: refine queries based on gaps + user feedback
For query patterns and source-triage heuristics, see references/query-playbook.md.
Local downloads: use a /tmp work directory
If you need to download files for local analysis (PDFs, datasets, repos, etc.), create a dedicated work directory under /tmp first and download there.
- •Preferred:
workdir="$(scripts/mk_workdir.sh)" - •Then download into:
$workdir - •Keep the repo clean; treat the
/tmpdirectory as disposable.
PDFs: Can use pdftotext if you can not directly read them
If a key source is a PDF, prefer converting it to text locally so you can search/quote accurately.
- •If you can download the PDF: use
scripts/pdf_to_text.sh(wrapper aroundpdftotext/pdftotxt). - •If you cannot download: fall back to
web.run.screenshot+ manual extraction, but note the limitations.
Arxiv papers: Prefer fetching HTML over PDF when possible
- •Many Arxiv papers have HTML versions that are easier to read/search than PDFs. e.g. https://arxiv.org/abs/XXXX.XXXXX often has a link to HTML which is available at https://arxiv.org/html/XXXX.XXXXX.
- •If you can open the HTML version, prefer that over downloading the PDF.
If encountering access restrictions (like CAPTCHAs or paywalls)
- •Inform the user to manually access the source and provide the content or a screenshot.
1) Workflow (n-round research loop)
Step 1: Detect vagueness → request clarification
If the prompt is too vague to search effectively, ask 2–5 clarification questions before browsing, covering:
- •Goal: what decision/action will this inform?
- •Scope: which sub-area(s) matter and which don’t?
- •Time window: “latest” as of when? (date range)
- •Region/context constraints: geography, industry, stack, budget, risk tolerance
- •Output preference: quick overview vs deep dive; recommendations vs neutral map
If the user can’t answer, state explicit assumptions and proceed.
Step 2: Round 1 broad scan
Generate 6–12 query variants, mixing:
- •Chinese + English keywords (and common acronyms)
- •Synonyms and alternative names
- •“comparison / vs / benchmark / survey / tutorial / docs / RFC / issue / postmortem”
- •Community filters (as needed):
site:reddit.com,site:news.ycombinator.com,site:stackoverflow.com,site:github.com
Run web.run.search_query and quickly open the top results to extract:
- •Canonical definitions / terminology
- •Mainstream approaches and current “best practices”
- •Key trade-offs / controversies
- •High-signal sources to read next (official docs, top repos, surveys, FAQs)
Keep lightweight notes as: claim → source → date.
Step 3: Report after Round 1 (directions + plan)
Return a short landscape map:
- •3–7 plausible directions (each: what it is + why it matters)
- •What seems stable consensus vs what’s disputed
- •A proposed deep-dive plan (2–4 subtopics, sources to prioritize, questions to resolve)
- •2–3 targeted questions for the user to choose direction and constraints
Then explicitly ask the user to comment/choose: “Which direction should we deep dive first?”
Step 4: Round 2..N deep dive loop
After the user’s feedback, pick 1–3 focused subtopics and search deeply:
- •Prioritize primary sources when possible: official docs/specs, standards, research papers, repos/design docs.
- •Include community discussion for pitfalls and edge cases: issues/PRs, postmortems, forums.
- •Search “enough” before concluding: multiple independent sources, and at least one primary source when available.
For each round, deliver:
- •Key findings with supporting links (and dates for time-sensitive claims)
- •Comparison table / pros-cons / decision criteria
- •Open questions + next search angles (what to look up next and why)
Then ask for feedback and repeat Step 4 as needed.
Quality Bar
- •Treat “latest” as time-sensitive: always include dates and call out what may have changed recently.
- •Separate facts, informed interpretation, and speculation.
- •If sources disagree, present both sides and explain plausible reasons (methodology, context, recency).