Finding Open Access Papers
Overview
Use Unpaywall to find legally available open access versions of papers that appear to be behind paywalls.
Core principle: Many paywalled papers have free versions (preprints, author manuscripts, institutional repositories). Unpaywall finds them.
When to Use
Use this skill when:
- •DOI resolution hits a paywall
- •Paper not available in PubMed Central
- •Publisher site requires subscription
- •Need full text for highly relevant paper (score ≥7)
Use BEFORE giving up on full text access
Unpaywall API
Simple REST API - no authentication required for reasonable usage
Basic Request
curl "https://api.unpaywall.org/v2/DOI?email=YOUR_EMAIL"
Parameters:
- •
DOI- The paper's DOI (URL-encoded if needed) - •
email- User's email (required, for courtesy/contact)
IMPORTANT: Ask user for their email at the start of research session. Do NOT use placeholder emails like claude@anthropic.com or researcher@example.com.
Example:
curl "https://api.unpaywall.org/v2/10.1038/nature12373?email=user@example.com"
Response Format
{
"doi": "10.1038/nature12373",
"title": "Paper Title",
"is_oa": true,
"best_oa_location": {
"url": "https://europepmc.org/articles/pmc3858213",
"url_for_pdf": "https://europepmc.org/articles/pmc3858213?pdf=render",
"version": "publishedVersion",
"license": "cc-by",
"host_type": "repository"
},
"oa_locations": [
{
"url": "https://europepmc.org/articles/pmc3858213",
"version": "publishedVersion"
},
{
"url": "https://arxiv.org/abs/1234.5678",
"version": "submittedVersion"
}
]
}
Key Response Fields
is_oa (boolean)
- •
true- Open access version available - •
false- No free version found
best_oa_location (object or null)
- •Unpaywall's recommended best open access source
- •Prioritizes published versions over preprints
- •Includes PDF URL when available
oa_locations (array)
- •All known open access locations
- •Includes repositories, preprint servers, institutional sites
- •Ordered by quality/version
version types:
- •
publishedVersion- Final published version (best) - •
acceptedVersion- Author's accepted manuscript (good) - •
submittedVersion- Preprint before peer review (useful)
Implementation Pattern
1. Check Unpaywall After Paywall Hit
# Try DOI first curl -L "https://doi.org/10.1234/example.2023" # If paywall detected (403, subscription required, etc): curl "https://api.unpaywall.org/v2/10.1234/example.2023?email=your@email.com"
2. Extract Best URL
# Parse JSON response response=$(curl -s "https://api.unpaywall.org/v2/DOI?email=EMAIL") # Check if OA available is_oa=$(echo $response | jq -r '.is_oa') if [ "$is_oa" = "true" ]; then # Get best PDF URL pdf_url=$(echo $response | jq -r '.best_oa_location.url_for_pdf // .best_oa_location.url') # Download curl -L -o "papers/paper.pdf" "$pdf_url" fi
3. Report to User
When OA found:
⚠️ Paper behind paywall at publisher ✓ Found open access version via Unpaywall! Source: Europe PMC (published version) PDF: https://europepmc.org/articles/pmc3858213?pdf=render → Downloading...
When no OA found:
⚠️ Paper behind paywall at publisher ✗ No open access version found via Unpaywall Options: - Request via institutional access - Contact authors for preprint - Continue with abstract only
4. Prioritize by Version
If multiple locations available:
Priority order:
- •
publishedVersionfrom publisher or PMC - •
acceptedVersionfrom institutional repository - •
submittedVersionfrom preprint server (arXiv, bioRxiv)
Integration with evaluating-paper-relevance
Add to full text fetching workflow:
Stage 2: Fetch Full Text Try in order: A. PubMed Central (free full text) B. DOI resolution → If paywall, try Unpaywall C. Unpaywall direct lookup D. Preprints (bioRxiv, arXiv)
Updated workflow:
# 1. Try PMC
pmc_result=$(curl "https://eutils.ncbi.nlm.nih.gov/...")
if has_pmc_fulltext; then
fetch_pmc
exit 0
fi
# 2. Try DOI
doi_result=$(curl -L "https://doi.org/$doi")
if is_paywall; then
# 3. Try Unpaywall
unpaywall_result=$(curl "https://api.unpaywall.org/v2/$doi?email=$EMAIL")
if has_oa; then
fetch_unpaywall_pdf
exit 0
fi
fi
# 4. No full text available
report_no_fulltext
Rate Limiting
Free tier (with email):
- •100,000 requests per day
- •No hard rate limit, but be respectful
- •Include email in requests (required)
Best practices:
- •Add 100ms delay between requests
- •Cache responses (don't re-check same DOI)
- •Only check for papers you actually need
Python Helper Example
import requests
import time
def find_open_access(doi, email):
"""
Find open access version via Unpaywall
Returns: (pdf_url, version, source) or (None, None, None)
"""
url = f"https://api.unpaywall.org/v2/{doi}"
params = {"email": email}
try:
response = requests.get(url, params=params, timeout=10)
response.raise_for_status()
data = response.json()
if not data.get('is_oa'):
return None, None, None
best_loc = data.get('best_oa_location')
if not best_loc:
return None, None, None
pdf_url = best_loc.get('url_for_pdf') or best_loc.get('url')
version = best_loc.get('version', 'unknown')
source = best_loc.get('host_type', 'unknown')
return pdf_url, version, source
except Exception as e:
print(f"Error checking Unpaywall for {doi}: {e}")
return None, None, None
# Usage
doi = "10.1038/nature12373"
pdf_url, version, source = find_open_access(doi, "researcher@example.com")
if pdf_url:
print(f"Found {version} at {source}")
print(f"PDF: {pdf_url}")
# Download PDF
response = requests.get(pdf_url)
with open(f'papers/{doi.replace("/", "_")}.pdf', 'wb') as f:
f.write(response.content)
else:
print("No open access version found")
time.sleep(0.1) # Rate limiting
Common Sources Found
Repositories:
- •Europe PMC / PubMed Central
- •Institutional repositories (university sites)
- •PubMed Central international mirrors
Preprint servers:
- •bioRxiv (biology)
- •medRxiv (medicine)
- •arXiv (physics, CS, math)
- •ChemRxiv (chemistry)
Publisher sites:
- •Open access journals
- •Hybrid journals (OA articles in subscription journals)
- •Delayed open access (embargo expired)
Error Handling
DOI not found:
{
"error": "true",
"message": "DOI not found"
}
→ Check DOI format, try alternative identifiers
Network errors:
- •Retry with exponential backoff
- •Maximum 3 attempts
- •Report to user if all fail
Malformed response:
- •Check for
is_oafield - •Fallback to
oa_locationsarray ifbest_oa_locationmissing
Quick Reference
| Task | Command |
|---|---|
| Check if OA available | curl "https://api.unpaywall.org/v2/DOI?email=EMAIL" |
| Get best PDF URL | Parse .best_oa_location.url_for_pdf |
| List all OA sources | Parse .oa_locations[] |
| Check version type | Look at .version field |
| Download PDF | curl -L -o paper.pdf "$pdf_url" |
Integration Points
Called by:
- •
evaluating-paper-relevance- When full text not in PMC - •
answering-research-questions- For highly relevant papers
Updates:
- •
papers-reviewed.json- Note if OA found - •
SUMMARY.md- Include OA source info
Common Mistakes
Using placeholder email: Using claude@anthropic.com or researcher@example.com → Ask user for their real email
Not including email: Required parameter, requests will fail
Checking every paper: Only check when needed (score ≥7, no PMC)
Ignoring version type: Published version better than preprint
Single source only: Check oa_locations array for alternatives
No rate limiting: Add delays even though no hard limit
Success Criteria
Successful when:
- •Paywalled paper's OA version found and downloaded
- •Version type recorded (published/accepted/submitted)
- •User informed about source and version
- •Fallback options provided if no OA available
Next Steps
After finding OA version:
- •Download PDF to papers/ folder
- •Note source and version in SUMMARY.md
- •Continue with deep dive analysis
- •If no OA: note in summary, continue with abstract only