AgentSkillsCN

Paper Discovery

从学术资源中查找并精选研究论文。适用于用户希望寻找论文、进行研究搜索、发现文章,或探索全新主题时使用。

SKILL.md
--- frontmatter
name: Paper Discovery
description: Find and curate research papers from academic sources. Use when user asks to find papers, search for research, discover articles, or explore a new topic.
tools:
  - list_available_sources
  - create_research_question
  - run_discovery_for_question
  - list_articles
  - search_articles
  - collection_stats

Paper Discovery

Find and curate research papers across academic sources (arXiv, PubMed, Semantic Scholar, OpenAlex, etc.).

Tools to Use

For discovery tasks, use ONLY these tools:

ToolPurpose
list_available_sourcesSee available search sources
create_research_questionCreate a new search query
run_discovery_for_questionExecute the search
list_articlesBrowse results
search_articlesFilter/search within results
collection_statsCheck collection size

Quick Discovery (5 min)

For a quick search on a topic:

code
Step 1: Create the query
create_research_question(
  title="User's topic in 1-2 sentences",
  keywords=["keyword1", "keyword2", "keyword3"],
  sources=["semantic_scholar", "openalex"],
  max_papers=25,
  relevance_threshold=0.7
)

Step 2: Run discovery
run_discovery_for_question(question_id="[from step 1]")

Step 3: Review results
list_articles(limit=20, sort_by="relevance")

Source Selection Guide

Research AreaRecommended Sources
CS/ML/AIarxiv, semantic_scholar
Medical/Biopubmed, biorxiv
General Scienceopenalex, crossref
Cross-disciplinarysemantic_scholar, openalex

Default: Use semantic_scholar + openalex for broad coverage.

Need a Source Not Listed?

If the user wants papers from a website/journal not in the built-in sources list:

Load the custom-source-setup skill to set up auto-detected scrapers for any website. This allows adding sources like:

  • Specific journal websites (Nature, Science, PLOS ONE, etc.)
  • Conference proceedings pages (ACL Anthology, NeurIPS, etc.)
  • Institutional repositories (NBER, SSRN, arXiv mirrors, etc.)
  • Any website with article listings

Example trigger phrases:

  • "Can you get papers from NBER?"
  • "Add Nature Neuroscience as a source"
  • "Scrape articles from this URL: https://..."

Keyword Extraction

Extract keywords from user's request:

  1. Core nouns: Main concepts (e.g., "transformers", "attention")
  2. Technical terms: Field-specific language (e.g., "multi-head", "self-attention")
  3. Modifiers: Scope limiters (e.g., "efficient", "sparse", "2024")

Example:

  • User: "Find papers on efficient attention mechanisms in vision transformers"
  • Keywords: ["vision transformer", "efficient attention", "ViT", "sparse attention"]

Relevance Threshold Guide

ThresholdUse When
0.8+User wants only highly relevant papers
0.7Default - good balance
0.6Comprehensive search, broader coverage
0.5Exploratory, casting a wide net

When to Delegate to Research Analyst

Delegate using send_message_to_agent when user needs:

  • Deep analysis of discovered papers
  • Quality assessment of results
  • Literature synthesis across papers
  • Citation network exploration

Example delegation:

code
send_message_to_agent(
  agent_name="Research Analyst",
  message="Analyze these 10 papers on sparse attention and summarize key approaches: [paper IDs]"
)

Workflow Examples

Example 1: Specific Topic Search

User: "Find recent papers on mixture of experts in LLMs"

code
1. create_research_question(
     title="Mixture of Experts in Large Language Models",
     keywords=["mixture of experts", "MoE", "sparse MoE", "LLM"],
     sources=["arxiv", "semantic_scholar"],
     max_papers=30,
     relevance_threshold=0.75
   )

2. run_discovery_for_question(question_id="...")

3. list_articles(limit=15, sort_by="date")

4. Report: "Found X papers on MoE in LLMs. Top 5: [list]. 
   Would you like me to analyze any of these in depth?"

Example 2: Broad Exploration

User: "I want to explore what's happening in protein folding research"

code
1. list_available_sources()  # Show user options

2. create_research_question(
     title="Recent advances in protein structure prediction",
     keywords=["protein folding", "AlphaFold", "protein structure prediction"],
     sources=["biorxiv", "pubmed", "semantic_scholar"],
     max_papers=50,
     relevance_threshold=0.65
   )

3. run_discovery_for_question(question_id="...")

4. collection_stats()  # Show what was found

5. Report summary of results by sub-topic

Error Handling

ErrorSolution
No resultsLower threshold, broaden keywords, add sources
Too many resultsRaise threshold, add specific keywords
Wrong domain papersAdd negative keywords, change sources
TimeoutReduce sources, lower max_papers

Response Template

After discovery, report:

code
## Discovery Results: [Topic]

**Sources searched**: [list]
**Papers found**: [count]
**Relevance threshold**: [value]

### Top Papers:
1. [Title] - [Authors] - [Year]
   Brief: [1 sentence description]

2. ...

### Next Steps:
- Would you like me to analyze any of these papers in depth?
- Should I set this up as a recurring search?
- Want me to adjust the search parameters?