Protein Structure Data Retrieval
Retrieve protein structures with proper disambiguation, quality assessment, and comprehensive metadata.
Workflow Overview
Phase 0: Clarify (if needed)
↓
Phase 1: Disambiguate Protein Identity
↓
Phase 2: Retrieve Structures (Internal)
↓
Phase 3: Report Structure Profile
Phase 0: Clarification (When Needed)
Ask the user ONLY if:
- •Protein name matches multiple genes/families (e.g., "kinase" → which kinase?)
- •Organism not specified for conserved proteins
- •Intent unclear: need experimental structure vs AlphaFold prediction?
Skip clarification for:
- •Specific PDB IDs (4-character codes)
- •UniProt accessions
- •Unambiguous protein names with organism
Phase 1: Protein Disambiguation
1.1 Resolve Protein Identity
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# Strategy depends on input type
if user_provided_pdb_id:
# Direct structure retrieval
pdb_id = user_provided_pdb_id.upper()
elif user_provided_uniprot:
# Get UniProt info, then search structures
uniprot_id = user_provided_uniprot
# Can also get AlphaFold structure
af_structure = tu.tools.alphafold_get_structure_by_uniprot(
uniprot_id=uniprot_id
)
elif user_provided_protein_name:
# Search by name
result = tu.tools.search_structures_by_protein_name(
protein_name=protein_name
)
1.2 Identity Resolution Checklist
- • Protein name/gene identified
- • Organism confirmed
- • UniProt accession (if available)
- • Isoform/variant specified (if relevant)
1.3 Handle Naming Collisions
Common ambiguous terms:
| Term | Ambiguity | Resolution |
|---|---|---|
| "kinase" | Hundreds of kinases | Ask which kinase (EGFR, CDK2, etc.) |
| "receptor" | Many receptor types | Specify receptor family |
| "protease" | Multiple families | Ask serine/cysteine/metallo/etc. |
| "hemoglobin" | Clear | Proceed (α/β chain specified if needed) |
| "insulin" | Clear | Proceed |
Phase 2: Data Retrieval (Internal)
Retrieve all data silently. Do NOT narrate the search process.
2.1 Search Structures
# Search by protein name
result = tu.tools.search_structures_by_protein_name(
protein_name=protein_name
)
# Filter results by quality
high_res = [
entry for entry in result["data"]
if entry.get("resolution") and entry["resolution"] < 2.5
]
2.2 Get Structure Details
For each relevant structure:
pdb_id = "4INS"
# Basic metadata
metadata = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)
# Experimental details
exp_details = tu.tools.get_protein_experimental_details_by_pdb_id(
pdb_id=pdb_id
)
# Resolution (if X-ray)
resolution = tu.tools.get_protein_resolution_by_pdb_id(pdb_id=pdb_id)
# Bound ligands
ligands = tu.tools.get_protein_ligands_by_pdb_id(pdb_id=pdb_id)
# Similar structures
similar = tu.tools.get_similar_structures_by_pdb_id(
pdb_id=pdb_id,
cutoff=2.0
)
2.3 PDBe Additional Data
# Entry summary summary = tu.tools.pdbe_get_entry_summary(pdb_id=pdb_id) # Molecular entities molecules = tu.tools.pdbe_get_molecules(pdb_id=pdb_id) # Binding sites binding_sites = tu.tools.pdbe_get_binding_sites(pdb_id=pdb_id)
2.4 AlphaFold Predictions
# When no experimental structure exists, or for comparison
if uniprot_id:
af_structure = tu.tools.alphafold_get_structure_by_uniprot(
uniprot_id=uniprot_id
)
Fallback Chains
| Primary | Fallback | Notes |
|---|---|---|
| RCSB search | PDBe search | Regional availability |
| get_protein_metadata | pdbe_get_entry_summary | Alternative source |
| Experimental structure | AlphaFold prediction | No experimental structure |
| get_protein_ligands | pdbe_get_binding_sites | Ligand info unavailable |
Phase 3: Report Structure Profile
Output Structure
Present as a Structure Profile Report. Hide search process.
# Protein Structure Profile: [Protein Name] **Search Summary** - Query: [protein name/PDB ID] - Organism: [species] - Structures Found: [N] experimental, [M] AlphaFold --- ## Best Available Structure ### [PDB ID]: [Title] | Attribute | Value | |-----------|-------| | **PDB ID** | [pdb_id] | | **UniProt** | [uniprot_id] | | **Organism** | [species] | | **Method** | X-ray / Cryo-EM / NMR | | **Resolution** | [X.XX] Å | | **Release Date** | [date] | **Quality Assessment**: ●●● High / ●●○ Medium / ●○○ Low ### Experimental Details | Parameter | Value | |-----------|-------| | **Method** | [X-ray crystallography] | | **Resolution** | [1.9 Å] | | **R-factor** | [0.18] | | **R-free** | [0.21] | | **Space Group** | [P 21 21 21] | ### Structure Composition | Component | Count | Details | |-----------|-------|---------| | **Chains** | [N] | [A (enzyme), B (inhibitor)] | | **Residues** | [N] | [coverage %] | | **Ligands** | [N] | [list ligand names] | | **Waters** | [N] | | | **Metals** | [N] | [Zn, Mg, etc.] | ### Bound Ligands | Ligand ID | Name | Type | Binding Site | |-----------|------|------|--------------| | [ATP] | Adenosine triphosphate | Substrate | Active site | | [MG] | Magnesium ion | Cofactor | Catalytic | ### Binding Site Details For drug discovery applications: **Site 1: Active Site** - Location: Chain A, residues 45-89 - Key residues: Asp45, Glu67, His89 - Pocket volume: [X] ų - Druggability: High/Medium/Low --- ## Alternative Structures Ranked by quality and relevance: | Rank | PDB ID | Resolution | Method | Ligands | Notes | |------|--------|------------|--------|---------|-------| | 1 | [4INS] | 1.9 Å | X-ray | Zn | Best resolution | | 2 | [3I40] | 2.1 Å | X-ray | Zn, phenol | With inhibitor | | 3 | [1TRZ] | 2.3 Å | X-ray | None | Porcine | --- ## AlphaFold Prediction ### AF-[UniProt]-F1 | Attribute | Value | |-----------|-------| | **UniProt** | [uniprot_id] | | **Model Version** | [v4] | | **Confidence (pLDDT)** | [average score] | **Confidence Distribution**: - Very High (>90): [X]% of residues - High (70-90): [X]% of residues - Low (50-70): [X]% of residues - Very Low (<50): [X]% of residues **Use Cases**: - ✓ Overall fold reliable - ✓ Core domain structure - ⚠ Loop regions uncertain - ✗ Not suitable for binding site analysis --- ## Structure Comparison | Property | [PDB_1] | [PDB_2] | AlphaFold | |----------|---------|---------|-----------| | Resolution | 1.9 Å | 2.5 Å | N/A (predicted) | | Completeness | 98% | 85% | 100% | | Ligands | Yes | No | No | | Confidence | Experimental | Experimental | High (85 avg) | --- ## Download Links ### Coordinate Files | Format | PDB ID | Link | |--------|--------|------| | PDB | [4INS] | [link] | | mmCIF | [4INS] | [link] | | AlphaFold | [UniProt] | [link] | ### Database Links - RCSB PDB: https://www.rcsb.org/structure/[pdb_id] - PDBe: https://www.ebi.ac.uk/pdbe/entry/pdb/[pdb_id] - AlphaFold: https://alphafold.ebi.ac.uk/entry/[uniprot_id] Retrieved: [date]
Quality Assessment Tiers (Aligned with Evidence Grading)
Experimental Structures
| Tier | Symbol | Criteria | Evidence Equivalent |
|---|---|---|---|
| Excellent | ●●●● | X-ray <1.5Å, complete, R-free <0.22 | ★★★ |
| High | ●●●○ | X-ray <2.0Å OR Cryo-EM <3.0Å | ★★★ |
| Good | ●●○○ | X-ray 2.0-3.0Å OR Cryo-EM 3.0-4.0Å | ★★☆ |
| Moderate | ●○○○ | X-ray >3.0Å OR NMR ensemble | ★★☆ |
| Low | ○○○○ | >4.0Å, incomplete, or problematic | ★☆☆ |
Resolution Guide & Evidence Implications
| Resolution | Use Case | Reliability |
|---|---|---|
| <1.5 Å | Atomic detail, H-bond analysis | ★★★ atomic positions precise |
| 1.5-2.0 Å | Drug design, mechanism studies | ★★★ side chains reliable |
| 2.0-2.5 Å | Structure-based design | ★★☆ backbone reliable, some side chain ambiguity |
| 2.5-3.5 Å | Overall architecture, fold | ★★☆ for fold, ★☆☆ for details |
| >3.5 Å | Domain arrangement only | ★☆☆ interpret with caution |
AlphaFold Confidence & Evidence Tiers
| pLDDT Score | Interpretation | Evidence Equivalent |
|---|---|---|
| >90 | Very high confidence, experimental-like | ★★☆ (validated models) |
| 70-90 | Good backbone confidence | ★★☆ for fold, ★☆☆ for details |
| 50-70 | Uncertain, flexible regions | ★☆☆ use for domain boundaries only |
| <50 | Low confidence, likely disordered | ☆☆☆ not suitable for structural analysis |
When to Use Which Source
**For Drug Discovery Applications**: - Binding site analysis: Require ★★★ (X-ray <2.5Å with ligand) - Virtual screening: Accept ★★☆ (X-ray <3.0Å or good AlphaFold) - Homology modeling: AlphaFold ★★☆ acceptable as template **For Mechanism Studies**: - Active site geometry: ★★★ required (high-resolution X-ray) - Domain architecture: ★★☆ sufficient (AlphaFold acceptable) - Disordered regions: AlphaFold pLDDT <50 = likely real disorder **Citing Structures in Research Reports**: EGFR kinase domain structure [★★★: PDB 1M17, 2.6Å X-ray with erlotinib] shows the DFG-out conformation. AlphaFold prediction [★★☆: AF-P00533-F1, pLDDT 92] covers disordered regions not captured crystallographically.
Completeness Checklist
Every structure report MUST include:
For Specific PDB ID (Required)
- • PDB ID and title
- • Experimental method
- • Resolution (or N/A for NMR)
- • Organism
- • Quality assessment
- • Download links
For Protein Name Search (Required)
- • Search summary with result count
- • Top structures with quality ranking
- • Best structure recommendation
- • AlphaFold alternative (if no experimental structure)
Always Include
- • Ligand information (or "No ligands bound")
- • Data sources with links
- • Retrieval date
Common Use Cases
Drug Discovery Target
User: "Get structure for EGFR kinase with inhibitor" → Filter for ligand-bound structures, emphasize binding site
Model Building
User: "Find best template for homology modeling of protein X" → High-resolution structures, note sequence coverage
Structure Comparison
User: "Compare available SARS-CoV-2 main protease structures" → All structures with systematic comparison table
AlphaFold When No Experimental
User: "Structure of protein with UniProt P12345" → Check PDB first, then AlphaFold, note confidence
Error Handling
| Error | Response |
|---|---|
| "PDB ID not found" | Verify 4-character format, check if obsoleted |
| "No structures for protein" | Offer AlphaFold prediction, suggest similar proteins |
| "Download failed" | Retry once, provide alternative link |
| "Resolution unavailable" | Likely NMR/model, note in assessment |
Tool Reference
RCSB PDB (Experimental Structures)
| Tool | Purpose |
|---|---|
search_structures_by_protein_name | Name-based search |
get_protein_metadata_by_pdb_id | Basic info |
get_protein_experimental_details_by_pdb_id | Method details |
get_protein_resolution_by_pdb_id | Quality metric |
get_protein_ligands_by_pdb_id | Bound molecules |
download_pdb_structure_file | Coordinate files |
get_similar_structures_by_pdb_id | Homologs |
PDBe (European PDB)
| Tool | Purpose |
|---|---|
pdbe_get_entry_summary | Overview |
pdbe_get_molecules | Molecular entities |
pdbe_get_experiment_info | Experimental data |
pdbe_get_binding_sites | Ligand pockets |
AlphaFold (Predictions)
| Tool | Purpose |
|---|---|
alphafold_get_structure_by_uniprot | Get prediction |
alphafold_search_structures | Search predictions |