AgentSkillsCN

bio-entrez-link

使用 Biopython Bio.Entrez 在 NCBI 数据库之间查找交叉引用。在从基因导航至蛋白质,从序列导航至文献,查找相关记录,或发现数据库之间的关联时,可选用此功能。

SKILL.md
--- frontmatter
name: bio-entrez-link
description: Find cross-references between NCBI databases using Biopython Bio.Entrez. Use when navigating from genes to proteins, sequences to publications, finding related records, or discovering database relationships.
tool_type: python
primary_tool: Bio.Entrez

Entrez Link

Navigate between NCBI databases using Biopython's Entrez module (ELink utility).

Required Setup

python
from Bio import Entrez

Entrez.email = 'your.email@example.com'  # Required by NCBI
Entrez.api_key = 'your_api_key'          # Optional, raises rate limit

Core Function

Entrez.elink() - Cross-Database Links

Find related records in the same or different databases.

python
# Find proteins linked to a gene
handle = Entrez.elink(dbfrom='gene', db='protein', id='672')
record = Entrez.read(handle)
handle.close()

# Extract linked IDs
linkset = record[0]
if linkset['LinkSetDb']:
    links = linkset['LinkSetDb'][0]['Link']
    protein_ids = [link['Id'] for link in links]
    print(f"Found {len(protein_ids)} linked proteins")

Key Parameters:

ParameterDescriptionExample
dbfromSource database'gene'
dbTarget database'protein'
idSource record ID(s)'672' or '672,675'
linknameSpecific link type'gene_protein_refseq'
cmdLink command'neighbor', 'neighbor_score'

ELink Result Structure

python
record[0]                          # First linkset
record[0]['DbFrom']                # Source database
record[0]['IdList']                # Input IDs
record[0]['LinkSetDb']             # List of link results
record[0]['LinkSetDb'][0]['DbTo']  # Target database
record[0]['LinkSetDb'][0]['LinkName']  # Link name
record[0]['LinkSetDb'][0]['Link']  # List of linked records
record[0]['LinkSetDb'][0]['Link'][0]['Id']  # Linked ID

Common Link Paths

Gene to Other Databases

FromToLink NameDescription
geneproteingene_proteinAll proteins
geneproteingene_protein_refseqRefSeq proteins only
genenucleotidegene_nuccoreNucleotide sequences
genenucleotidegene_nuccore_refseqrnaRefSeq mRNA
genepubmedgene_pubmedRelated publications
genehomologenegene_homologeneHomologs
genesnpgene_snpSNPs in gene
geneclinvargene_clinvarClinical variants

Nucleotide to Other Databases

FromToLink NameDescription
nucleotideproteinnuccore_proteinEncoded proteins
nucleotidegenenuccore_geneGene records
nucleotidepubmednuccore_pubmedPublications
nucleotidetaxonomynuccore_taxonomyOrganism taxonomy
nucleotidebiosamplenuccore_biosampleSample info
nucleotidesranuccore_sraRelated SRA data

Protein to Other Databases

FromToLink NameDescription
proteinnucleotideprotein_nuccoreCoding sequences
proteingeneprotein_geneGene records
proteinpubmedprotein_pubmedPublications
proteinstructureprotein_structure3D structures
proteincddprotein_cddConserved domains

PubMed Links

FromToLink NameDescription
pubmedpubmedpubmed_pubmedRelated articles
pubmedgenepubmed_geneMentioned genes
pubmedproteinpubmed_proteinMentioned proteins
pubmednucleotidepubmed_nuccoreMentioned sequences

Code Patterns

Gene to Protein

python
from Bio import Entrez

Entrez.email = 'your.email@example.com'

def get_proteins_for_gene(gene_id):
    handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq')
    record = Entrez.read(handle)
    handle.close()

    if not record[0]['LinkSetDb']:
        return []
    return [link['Id'] for link in record[0]['LinkSetDb'][0]['Link']]

protein_ids = get_proteins_for_gene('672')  # BRCA1
print(f"RefSeq proteins: {protein_ids[:5]}")

Nucleotide to Gene

python
def get_gene_for_nucleotide(nuc_id):
    handle = Entrez.elink(dbfrom='nucleotide', db='gene', id=nuc_id)
    record = Entrez.read(handle)
    handle.close()

    if not record[0]['LinkSetDb']:
        return None
    return record[0]['LinkSetDb'][0]['Link'][0]['Id']

gene_id = get_gene_for_nucleotide('NM_007294')
print(f"Gene ID: {gene_id}")

Find Related PubMed Articles

python
def get_related_articles(pmid, max_results=10):
    handle = Entrez.elink(dbfrom='pubmed', db='pubmed', id=pmid, linkname='pubmed_pubmed')
    record = Entrez.read(handle)
    handle.close()

    if not record[0]['LinkSetDb']:
        return []
    links = record[0]['LinkSetDb'][0]['Link']
    return [link['Id'] for link in links[:max_results]]

related = get_related_articles('35412348')
print(f"Related articles: {related}")

Get All Available Links

python
def discover_links(db, record_id):
    handle = Entrez.elink(dbfrom=db, id=record_id, cmd='acheck')
    record = Entrez.read(handle)
    handle.close()

    links = {}
    for linkset in record[0].get('LinkSetDb', []):
        links[linkset['LinkName']] = linkset['DbTo']
    return links

available = discover_links('gene', '672')
for name, target in available.items():
    print(f"{name} -> {target}")

Navigate Gene -> Protein -> Structure

python
def gene_to_structures(gene_id):
    # Gene to protein
    handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq')
    record = Entrez.read(handle)
    handle.close()

    if not record[0]['LinkSetDb']:
        return []
    protein_ids = [link['Id'] for link in record[0]['LinkSetDb'][0]['Link'][:5]]

    # Protein to structure
    handle = Entrez.elink(dbfrom='protein', db='structure', id=','.join(protein_ids))
    record = Entrez.read(handle)
    handle.close()

    structure_ids = []
    for linkset in record:
        if linkset['LinkSetDb']:
            structure_ids.extend([link['Id'] for link in linkset['LinkSetDb'][0]['Link']])
    return structure_ids

structures = gene_to_structures('672')
print(f"Structure IDs: {structures[:5]}")

Link Multiple IDs at Once

python
def batch_link(dbfrom, db, ids):
    if isinstance(ids, list):
        ids = ','.join(ids)

    handle = Entrez.elink(dbfrom=dbfrom, db=db, id=ids)
    record = Entrez.read(handle)
    handle.close()

    # Returns one linkset per input ID
    results = {}
    for linkset in record:
        source_id = linkset['IdList'][0]
        linked_ids = []
        if linkset['LinkSetDb']:
            linked_ids = [link['Id'] for link in linkset['LinkSetDb'][0]['Link']]
        results[source_id] = linked_ids
    return results

results = batch_link('gene', 'protein', ['672', '675', '7157'])
for gene, proteins in results.items():
    print(f"Gene {gene}: {len(proteins)} proteins")

Get Publications for a Sequence

python
def get_sequence_publications(accession):
    # First get the GI/UID
    handle = Entrez.esearch(db='nucleotide', term=f'{accession}[accn]')
    search = Entrez.read(handle)
    handle.close()

    if not search['IdList']:
        return []
    uid = search['IdList'][0]

    # Link to PubMed
    handle = Entrez.elink(dbfrom='nucleotide', db='pubmed', id=uid)
    record = Entrez.read(handle)
    handle.close()

    if not record[0]['LinkSetDb']:
        return []
    return [link['Id'] for link in record[0]['LinkSetDb'][0]['Link']]

pmids = get_sequence_publications('NM_007294')
print(f"PubMed IDs: {pmids[:5]}")

Link Commands

CommandDescription
neighborDefault - get linked records
neighbor_scoreInclude relevance scores
neighbor_historyStore results in history
acheckList all available links
ncheckCheck if any links exist
lcheckCheck specific link exists
llinksGet URLs to Entrez links
prlinksGet provider links (external)

Common Errors

ErrorCauseSolution
Empty LinkSetDbNo links existCheck if record has linked data
HTTPError 400Invalid ID or databaseVerify ID exists in source database
KeyErrorMissing expected fieldCheck if LinkSetDb is empty first
Single linkset expected, got listMultiple input IDsIterate through record list

Decision Tree

code
Need to find related records?
├── Know what link you want?
│   └── Use elink with specific linkname
├── Discover what links exist?
│   └── Use elink with cmd='acheck'
├── Navigate to target database?
│   └── Use elink(dbfrom=X, db=Y, id=Z)
├── Find related records in same database?
│   └── Use elink(dbfrom=X, db=X) with neighbor
├── Chain multiple databases?
│   └── Call elink multiple times
└── Need the actual records?
    └── Use elink first, then efetch with IDs

Related Skills

  • entrez-search - Search databases before linking
  • entrez-fetch - Retrieve records after finding linked IDs
  • batch-downloads - Download many linked records efficiently