Entrez Search
Search NCBI databases using Biopython's Entrez module (ESearch, EInfo, EGQuery utilities).
Required Setup
python
from Bio import Entrez Entrez.email = 'your.email@example.com' # Required by NCBI Entrez.api_key = 'your_api_key' # Optional, raises rate limit 3->10 req/sec
Core Functions
Entrez.esearch() - Search a Database
Search any NCBI database and get matching record IDs.
python
handle = Entrez.esearch(db='nucleotide', term='human[orgn] AND BRCA1[gene]')
record = Entrez.read(handle)
handle.close()
print(f"Found {record['Count']} records")
print(f"IDs: {record['IdList']}") # First 20 IDs by default
Key Parameters:
| Parameter | Description | Default |
|---|---|---|
db | Database to search | Required |
term | Search query | Required |
retmax | Max IDs to return | 20 |
retstart | Starting index (pagination) | 0 |
usehistory | Store results on server | 'n' |
sort | Sort order | database-specific |
datetype | Date field to search | 'pdat' |
reldate | Records from last N days | None |
mindate | Start date (YYYY/MM/DD) | None |
maxdate | End date (YYYY/MM/DD) | None |
ESearch Result Fields:
python
record['Count'] # Total matching records (string) record['IdList'] # List of record IDs record['RetMax'] # Number of IDs returned record['RetStart'] # Starting index record['QueryKey'] # For history server (if usehistory='y') record['WebEnv'] # For history server (if usehistory='y') record['TranslationSet'] # Query translations applied record['QueryTranslation'] # Final translated query
Entrez.einfo() - Database Information
Get information about available databases or specific database fields.
python
# List all available databases
handle = Entrez.einfo()
record = Entrez.read(handle)
handle.close()
print(record['DbList']) # ['pubmed', 'protein', 'nucleotide', ...]
# Get info about specific database
handle = Entrez.einfo(db='nucleotide')
record = Entrez.read(handle)
handle.close()
print(f"Description: {record['DbInfo']['Description']}")
print(f"Record count: {record['DbInfo']['Count']}")
# List searchable fields
for field in record['DbInfo']['FieldList']:
print(f"{field['Name']}: {field['Description']}")
Database Info Fields:
python
record['DbInfo']['DbName'] # Database name record['DbInfo']['Description'] # Database description record['DbInfo']['Count'] # Total records in database record['DbInfo']['LastUpdate'] # Last update date record['DbInfo']['FieldList'] # Searchable fields record['DbInfo']['LinkList'] # Available links to other databases
Entrez.egquery() - Global Query
Search across all NCBI databases simultaneously.
python
handle = Entrez.egquery(term='CRISPR')
record = Entrez.read(handle)
handle.close()
for result in record['eGQueryResult']:
if int(result['Count']) > 0:
print(f"{result['DbName']}: {result['Count']} records")
Search Query Syntax
NCBI uses a specific query syntax:
Field Tags
python
# Search specific fields using [field_name] term = 'BRCA1[gene]' # Gene name field term = 'human[orgn]' # Organism field term = 'Homo sapiens[ORGN]' # Full organism name term = 'NM_007294[accn]' # Accession number term = 'Smith J[auth]' # Author (PubMed) term = 'Nature[jour]' # Journal (PubMed) term = '1000:5000[slen]' # Sequence length range term = 'mRNA[fkey]' # Feature key
Boolean Operators
python
term = 'BRCA1 AND human' # Both terms term = 'cancer OR tumor' # Either term term = 'human NOT mouse' # Exclude term term = '(BRCA1 OR BRCA2) AND human' # Grouping
Date Ranges
python
# Using date parameters
handle = Entrez.esearch(
db='pubmed',
term='CRISPR',
datetype='pdat', # Publication date
mindate='2023/01/01',
maxdate='2024/12/31'
)
# Or in query string
term = 'CRISPR AND 2024[pdat]'
term = 'CRISPR AND 2023:2024[pdat]'
Wildcards and Phrases
python
term = 'immun*' # Wildcard term = '"breast cancer"[title]' # Exact phrase
Common Databases
| Database | db value | Common Fields |
|---|---|---|
| PubMed | pubmed | [auth], [title], [jour], [pdat] |
| Nucleotide | nucleotide | [orgn], [gene], [accn], [slen] |
| Protein | protein | [orgn], [gene], [accn], [molwt] |
| Gene | gene | [orgn], [sym], [chr] |
| SRA | sra | [orgn], [platform], [strategy] |
| Taxonomy | taxonomy | [scin], [comn], [rank] |
| Assembly | assembly | [orgn], [level], [refseq] |
Code Patterns
Basic Search with Pagination
python
from Bio import Entrez
Entrez.email = 'your.email@example.com'
def search_ncbi(db, term, max_results=100):
handle = Entrez.esearch(db=db, term=term, retmax=max_results)
record = Entrez.read(handle)
handle.close()
return record['IdList'], int(record['Count'])
ids, total = search_ncbi('nucleotide', 'human[orgn] AND insulin[gene]')
print(f'Retrieved {len(ids)} of {total} total records')
Paginated Search for Large Results
python
def search_all_ids(db, term, batch_size=10000):
all_ids = []
handle = Entrez.esearch(db=db, term=term, retmax=0)
record = Entrez.read(handle)
handle.close()
total = int(record['Count'])
for start in range(0, total, batch_size):
handle = Entrez.esearch(db=db, term=term, retstart=start, retmax=batch_size)
record = Entrez.read(handle)
handle.close()
all_ids.extend(record['IdList'])
return all_ids
Search with History Server (for Large Results)
python
# Store results on NCBI server for subsequent fetching handle = Entrez.esearch(db='nucleotide', term='human[orgn] AND mRNA[fkey]', usehistory='y') record = Entrez.read(handle) handle.close() webenv = record['WebEnv'] query_key = record['QueryKey'] total = int(record['Count']) # Use webenv and query_key with efetch for batch downloads # See batch-downloads skill for details
Recent Records Only
python
# Records from last 30 days handle = Entrez.esearch(db='pubmed', term='CRISPR', reldate=30, datetype='pdat') record = Entrez.read(handle) handle.close()
Get Available Fields for a Database
python
def get_search_fields(db):
handle = Entrez.einfo(db=db)
record = Entrez.read(handle)
handle.close()
return [(f['Name'], f['Description']) for f in record['DbInfo']['FieldList']]
fields = get_search_fields('nucleotide')
for name, desc in fields[:10]:
print(f'{name}: {desc}')
Check Query Translation
python
handle = Entrez.esearch(db='nucleotide', term='human BRCA1')
record = Entrez.read(handle)
handle.close()
# See how NCBI interpreted your query
print(f"Your query was translated to: {record['QueryTranslation']}")
# e.g., '"homo sapiens"[Organism] AND BRCA1[All Fields]'
Common Errors
| Error | Cause | Solution |
|---|---|---|
HTTPError 429 | Rate limit exceeded | Add delays or use API key |
HTTPError 400 | Invalid query syntax | Check field names and operators |
| Empty IdList | No matches or typo | Check QueryTranslation field |
RuntimeError | Missing email | Set Entrez.email |
Decision Tree
code
Need to search NCBI?
├── Finding records in one database?
│ └── Use Entrez.esearch()
├── Search across all databases?
│ └── Use Entrez.egquery()
├── Need database field names?
│ └── Use Entrez.einfo(db='database')
├── List all available databases?
│ └── Use Entrez.einfo() (no db argument)
├── Results > 10,000 records?
│ └── Use usehistory='y', then batch fetch
└── Need to fetch actual records?
└── See entrez-fetch skill
Related Skills
- •entrez-fetch - Retrieve full records after searching
- •entrez-link - Find related records in other databases
- •batch-downloads - Download large result sets efficiently
- •geo-data - Search GEO expression datasets (specialized search)
- •blast-searches - Search by sequence similarity instead of keywords