ToolUniverse Disease Research
Generate a comprehensive, detailed disease research report with full source citations and evidence grading. The report is created as a markdown file and progressively updated during research.
KEY PRINCIPLES:
- •Report-first approach - Create report file FIRST, then populate progressively
- •Evidence grading - Grade all claims by evidence strength (T1-T4)
- •Citation requirements - Every fact must have inline source attribution
- •Mandatory completeness - All sections must exist, even if "limited data"
- •Disease disambiguation - Resolve EFO/ICD/UMLS IDs before research
Evidence Grading System (MANDATORY)
CRITICAL: Grade every claim by evidence strength for disease research.
Evidence Tiers for Disease Research
| Tier | Symbol | Criteria | Examples |
|---|---|---|---|
| T1 | ★★★ | Causal evidence, clinical trials, FDA approval | Mendelian gene mutations, Phase 3 trials |
| T2 | ★★☆ | Functional validation, large cohort studies | GWAS + functional follow-up, N>1000 cohorts |
| T3 | ★☆☆ | Association only, small studies, computational | GWAS without replication, case reports |
| T4 | ☆☆☆ | Review mention, database annotation, prediction | Review articles, text-mined associations |
Apply in Report
### 3.1 Causal Genes (Mendelian) Mutations in *PARK2* cause autosomal recessive juvenile Parkinson's [★★★: OMIM:602544, >500 families]. *LRRK2* G2019S is the most common genetic cause [★★★: PMID:15541309]. ### 3.2 GWAS Associations rs356219 at *SNCA* is associated with PD risk (OR=1.3) [★★☆: PMID:19915575, GWAS + replication]. rs6430538 shows association in European populations [★☆☆: PMID:xxxxx, single GWAS].
When to Use
Apply when the user:
- •Asks about any disease, syndrome, or medical condition
- •Needs comprehensive disease intelligence
- •Wants a detailed research report with citations
- •Asks "what do we know about [disease]?"
Core Workflow: Report-First Approach
DO NOT show the search process to the user. Instead:
- •Create report file first - Initialize
{disease_name}_research_report.md - •Research each dimension - Use all relevant tools
- •Update report progressively - Write findings to file after each dimension
- •Include citations - Every fact must reference its source tool
User: "Research Parkinson's disease" Agent Actions (internal, not shown to user): 1. Create "parkinsons_disease_research_report.md" with template 2. Research DIM 1 → Update Identity section 3. Research DIM 2 → Update Clinical section 4. ... continue for all 10 dimensions 5. Present final report to user
Report Template
Create this file structure at the start:
# Disease Research Report: {Disease Name}
**Report Generated**: {date}
**Disease Identifiers**: (to be filled)
---
## Executive Summary
(Brief 3-5 sentence overview - fill after all research complete)
---
## 1. Disease Identity & Classification
### Ontology Identifiers
| System | ID | Source |
|--------|-----|--------|
| EFO | | |
| ICD-10 | | |
| UMLS CUI | | |
| SNOMED CT | | |
### Synonyms & Alternative Names
- (list with source)
### Disease Hierarchy
- Parent:
- Subtypes:
**Sources**: (list tools used)
---
## 2. Clinical Presentation
### Phenotypes (HPO)
| HPO ID | Phenotype | Description | Source |
|--------|-----------|-------------|--------|
### Symptoms & Signs
- (list with source)
### Diagnostic Criteria
- (from literature/MedlinePlus)
**Sources**: (list tools used)
---
## 3. Genetic & Molecular Basis
### Associated Genes
| Gene | Score | Ensembl ID | Evidence | Source |
|------|-------|------------|----------|--------|
### GWAS Associations
| SNP | P-value | Odds Ratio | Study | Source |
|-----|---------|------------|-------|--------|
### Pathogenic Variants (ClinVar)
| Variant | Clinical Significance | Condition | Source |
|---------|----------------------|-----------|--------|
**Sources**: (list tools used)
---
## 4. Treatment Landscape
### Approved Drugs
| Drug | ChEMBL ID | Mechanism | Phase | Target | Source |
|------|-----------|-----------|-------|--------|--------|
### Clinical Trials
| NCT ID | Title | Phase | Status | Intervention | Source |
|--------|-------|-------|--------|--------------|--------|
### Treatment Guidelines
- (from literature)
**Sources**: (list tools used)
---
## 5. Biological Pathways & Mechanisms
### Key Pathways
| Pathway | Reactome ID | Genes Involved | Source |
|---------|-------------|----------------|--------|
### Protein-Protein Interactions
- (tissue-specific networks)
### Expression Patterns
| Tissue | Expression Level | Source |
|--------|------------------|--------|
**Sources**: (list tools used)
---
## 6. Epidemiology & Risk Factors
### Prevalence & Incidence
- (from literature)
### Risk Factors
| Factor | Evidence | Source |
|--------|----------|--------|
### GWAS Studies
| Study | Sample Size | Findings | Source |
|-------|-------------|----------|--------|
**Sources**: (list tools used)
---
## 7. Literature & Research Activity
### Publication Trends
- Total publications (5 years):
- Current year:
- Trend:
### Key Publications
| PMID | Title | Year | Citations | Source |
|------|-------|------|-----------|--------|
### Research Institutions
- (from OpenAlex)
**Sources**: (list tools used)
---
## 8. Similar Diseases & Comorbidities
### Similar Diseases
| Disease | Similarity Score | Shared Genes | Source |
|---------|-----------------|--------------|--------|
### Comorbidities
- (from literature/clinical data)
**Sources**: (list tools used)
---
## 9. Cancer-Specific Information (if applicable)
### CIViC Variants
| Gene | Variant | Evidence Level | Clinical Significance | Source |
|------|---------|----------------|----------------------|--------|
### Molecular Profiles
- (biomarkers)
### Targeted Therapies
| Therapy | Target | Evidence | Source |
|---------|--------|----------|--------|
**Sources**: (list tools used)
---
## 10. Drug Safety & Adverse Events
### Drug Warnings
| Drug | Warning Type | Description | Source |
|------|--------------|-------------|--------|
### Clinical Trial Adverse Events
| Trial | Drug | Adverse Event | Frequency | Source |
|-------|------|---------------|-----------|--------|
### FAERS Reports
- (FDA adverse event data)
**Sources**: (list tools used)
---
## References
### Data Sources Used
| Tool | Query | Section |
|------|-------|---------|
### Database Versions
- OpenTargets: (version/date)
- ClinVar: (version/date)
- GWAS Catalog: (version/date)
Research Protocol
Step 1: Initialize Report
from datetime import datetime
def create_report_file(disease_name):
"""Create initial report file with template"""
filename = f"{disease_name.lower().replace(' ', '_')}_research_report.md"
template = f"""# Disease Research Report: {disease_name}
**Report Generated**: {datetime.now().strftime('%Y-%m-%d %H:%M')}
**Disease Identifiers**: Pending research...
---
## Executive Summary
*Research in progress...*
---
## 1. Disease Identity & Classification
*Researching...*
## 2. Clinical Presentation
*Pending...*
[... rest of template ...]
"""
with open(filename, 'w') as f:
f.write(template)
return filename
Step 2: Research Each Dimension with Citations
For EACH piece of information, track:
- •Tool name that provided the data
- •Parameters used in the query
- •Timestamp of the query
def research_with_citations(tu, disease_name, report_file):
"""Research and update report with full citations"""
references = [] # Track all sources
# === DIMENSION 1: Identity ===
# Get EFO ID
efo_result = tu.tools.OSL_get_efo_id_by_disease_name(disease=disease_name)
efo_id = efo_result.get('efo_id')
references.append({
'tool': 'OSL_get_efo_id_by_disease_name',
'params': {'disease': disease_name},
'section': 'Identity'
})
# Get ICD codes
icd_result = tu.tools.icd_search_codes(query=disease_name, version="ICD10CM")
references.append({
'tool': 'icd_search_codes',
'params': {'query': disease_name, 'version': 'ICD10CM'},
'section': 'Identity'
})
# Get UMLS
umls_result = tu.tools.umls_search_concepts(query=disease_name)
references.append({
'tool': 'umls_search_concepts',
'params': {'query': disease_name},
'section': 'Identity'
})
# Get synonyms from EFO
if efo_id:
efo_term = tu.tools.ols_get_efo_term(obo_id=efo_id.replace('_', ':'))
references.append({
'tool': 'ols_get_efo_term',
'params': {'obo_id': efo_id},
'section': 'Identity'
})
# Get subtypes
children = tu.tools.ols_get_efo_term_children(obo_id=efo_id.replace('_', ':'), size=20)
references.append({
'tool': 'ols_get_efo_term_children',
'params': {'obo_id': efo_id, 'size': 20},
'section': 'Identity'
})
# UPDATE REPORT FILE with Identity section
update_report_section(report_file, 'Identity', {
'efo_id': efo_id,
'icd_codes': icd_result,
'umls': umls_result,
'synonyms': efo_term.get('synonyms', []) if efo_term else [],
'subtypes': children
}, references[-5:]) # Last 5 references for this section
# === DIMENSION 2: Clinical ===
# ... continue for all dimensions
Step 3: Update Report File After Each Dimension
def update_report_section(filename, section_name, data, sources):
"""Update a specific section in the report file"""
# Read current file
with open(filename, 'r') as f:
content = f.read()
# Format section content with citations
if section_name == 'Identity':
section_content = format_identity_section(data, sources)
elif section_name == 'Clinical':
section_content = format_clinical_section(data, sources)
# ... etc
# Replace placeholder with actual content
placeholder = f"## {section_number}. {section_name}\n*Researching...*"
content = content.replace(placeholder, section_content)
# Write back
with open(filename, 'w') as f:
f.write(content)
def format_identity_section(data, sources):
"""Format Identity section with proper citations"""
source_list = ', '.join([s['tool'] for s in sources])
return f"""## 1. Disease Identity & Classification
### Ontology Identifiers
| System | ID | Source |
|--------|-----|--------|
| EFO | {data['efo_id']} | OSL_get_efo_id_by_disease_name |
| ICD-10 | {data['icd_codes']} | icd_search_codes |
| UMLS CUI | {data['umls']} | umls_search_concepts |
### Synonyms & Alternative Names
{format_list_with_source(data['synonyms'], 'ols_get_efo_term')}
### Disease Subtypes
{format_list_with_source(data['subtypes'], 'ols_get_efo_term_children')}
**Sources**: {source_list}
"""
Complete Tool Usage by Section
Section 1: Identity (use ALL of these)
# Required tools - use all tu.tools.OSL_get_efo_id_by_disease_name(disease=disease_name) tu.tools.OpenTargets_get_disease_id_description_by_name(diseaseName=disease_name) tu.tools.ols_search_efo_terms(query=disease_name) tu.tools.ols_get_efo_term(obo_id=efo_id) tu.tools.ols_get_efo_term_children(obo_id=efo_id, size=30) tu.tools.umls_search_concepts(query=disease_name) tu.tools.umls_get_concept_details(cui=cui) tu.tools.icd_search_codes(query=disease_name, version="ICD10CM") tu.tools.snomed_search_concepts(query=disease_name)
Section 2: Clinical Presentation (use ALL of these)
tu.tools.OpenTargets_get_associated_phenotypes_by_disease_efoId(efoId=efo_id) tu.tools.get_HPO_ID_by_phenotype(query=symptom) # for each key symptom tu.tools.get_phenotype_by_HPO_ID(id=hpo_id) # for top phenotypes tu.tools.MedlinePlus_search_topics_by_keyword(term=disease_name, db="healthTopics") tu.tools.MedlinePlus_get_genetics_condition_by_name(condition=disease_slug) tu.tools.MedlinePlus_connect_lookup_by_code(cs=icd_oid, c=icd_code)
Section 3: Genetics (use ALL of these)
Evidence tier guide: Mendelian genes = ★★★, replicated GWAS = ★★☆, single GWAS = ★☆☆
# Disease-gene associations (★★☆ to ★★★) tu.tools.OpenTargets_get_associated_targets_by_disease_efoId(efoId=efo_id) tu.tools.OpenTargets_target_disease_evidence(efoId=efo_id, ensemblId=gene_id) # for top genes # Clinical variants (★★★) tu.tools.clinvar_search_variants(condition=disease_name, max_results=50) tu.tools.clinvar_get_variant_details(variant_id=vid) # for top variants tu.tools.clinvar_get_clinical_significance(variant_id=vid) # GWAS associations (★★☆ if replicated, ★☆☆ if single study) tu.tools.gwas_search_associations(disease_trait=disease_name, size=50) tu.tools.gwas_get_variants_for_trait(disease_trait=disease_name, size=50) tu.tools.gwas_get_associations_for_trait(disease_trait=disease_name, size=50) tu.tools.gwas_get_studies_for_trait(disease_trait=disease_name, size=30) tu.tools.GWAS_search_associations_by_gene(gene_name=gene) # for top genes # Variant details (★★★ for population data) tu.tools.gnomad_get_variant_frequency(variant=variant) # for key variants tu.tools.gnomad_get_gene_constraints(gene_symbol=gene) # constraint scores tu.tools.dbsnp_get_variant_by_rsid(rsid=rs_id) # dbSNP details # NEW: Deep GWAS analysis tu.tools.gwas_get_snp_by_id(snp_id=rs_id) # individual SNP details tu.tools.gwas_get_snps_for_gene(gene_symbol=gene) # GWAS SNPs at gene locus tu.tools.gwas_search_snps(query=disease_name) # SNP-level search
Section 4: Treatment (use ALL of these)
tu.tools.OpenTargets_get_associated_drugs_by_disease_efoId(efoId=efo_id, size=100) tu.tools.OpenTargets_get_drug_chembId_by_generic_name(drugName=drug) # for each drug tu.tools.OpenTargets_get_drug_mechanisms_of_action_by_chemblId(chemblId=chembl_id) tu.tools.search_clinical_trials(condition=disease_name, pageSize=50) tu.tools.get_clinical_trial_descriptions(nct_ids=nct_list) tu.tools.get_clinical_trial_conditions_and_interventions(nct_ids=nct_list) tu.tools.get_clinical_trial_eligibility_criteria(nct_ids=nct_list) tu.tools.get_clinical_trial_outcome_measures(nct_ids=nct_list) tu.tools.extract_clinical_trial_outcomes(nct_ids=nct_list) tu.tools.GtoPdb_list_diseases(name=disease_name) tu.tools.GtoPdb_get_disease(disease_id=gtopdb_id)
Section 5: Pathways (use ALL of these)
tu.tools.Reactome_get_diseases() tu.tools.Reactome_map_uniprot_to_pathways(id=uniprot_id) # for top genes tu.tools.Reactome_get_pathway(stId=pathway_id) # for key pathways tu.tools.Reactome_get_pathway_reactions(stId=pathway_id) tu.tools.humanbase_ppi_analysis(gene_list=top_genes, tissue=relevant_tissue) tu.tools.gtex_get_expression_by_gene(gene=gene) # for top genes tu.tools.HPA_get_protein_expression(gene=gene) tu.tools.geo_search_datasets(query=disease_name)
Section 6: Literature (use ALL of these)
tu.tools.PubMed_search_articles(query=f'"{disease_name}"', limit=100)
tu.tools.PubMed_search_articles(query=f'"{disease_name}" AND epidemiology', limit=50)
tu.tools.PubMed_search_articles(query=f'"{disease_name}" AND mechanism', limit=50)
tu.tools.PubMed_search_articles(query=f'"{disease_name}" AND treatment', limit=50)
tu.tools.PubMed_get_article(pmid=pmid) # for top 10 articles
tu.tools.PubMed_get_related(pmid=key_pmid)
tu.tools.PubMed_get_cited_by(pmid=key_pmid)
tu.tools.OpenTargets_get_publications_by_disease_efoId(efoId=efo_id)
tu.tools.openalex_search_works(query=disease_name, limit=50)
tu.tools.europe_pmc_search_abstracts(query=disease_name, limit=50)
tu.tools.semantic_scholar_search_papers(query=disease_name, limit=50)
Section 7: Similar Diseases
tu.tools.OpenTargets_get_similar_entities_by_disease_efoId(efoId=efo_id, threshold=0.3, size=30)
Section 8: Cancer-Specific (if cancer)
tu.tools.civic_search_diseases(limit=100) tu.tools.civic_search_genes(query=gene, limit=20) # for cancer genes tu.tools.civic_get_variants_by_gene(gene_id=civic_gene_id, limit=50) tu.tools.civic_get_variant(variant_id=vid) tu.tools.civic_get_evidence_item(evidence_id=eid) tu.tools.civic_search_therapies(limit=100) tu.tools.civic_search_molecular_profiles(limit=50)
Section 9: Pharmacology
tu.tools.GtoPdb_get_targets(target_type=type, limit=50) # GPCR, ion channel, etc tu.tools.GtoPdb_get_target(target_id=tid) # for disease-relevant targets tu.tools.GtoPdb_get_target_interactions(target_id=tid) tu.tools.GtoPdb_search_interactions(approved_only=True) tu.tools.GtoPdb_list_ligands(ligand_type="Approved")
Section 10: Safety (use ALL of these)
tu.tools.OpenTargets_get_drug_warnings_by_chemblId(chemblId=cid) # for each drug tu.tools.OpenTargets_get_drug_blackbox_status_by_chembl_ID(chemblId=cid) tu.tools.extract_clinical_trial_adverse_events(nct_ids=nct_list) tu.tools.FAERS_count_reactions_by_drug_event(drug=drug_name, event=event) tu.tools.AdverseEventPredictionQuestionGenerator(disease_name=disease, drug_name=drug)
Citation Format with Evidence Grading
Every piece of data MUST include its source AND evidence tier. Use this format:
In Tables
| Gene | Score | Evidence | Source | |------|-------|----------|--------| | APOE | 0.92 | ★★★ (causal) | OpenTargets_get_associated_targets_by_disease_efoId | | APP | 0.88 | ★★★ (Mendelian) | OpenTargets_get_associated_targets_by_disease_efoId | | CLU | 0.45 | ★★☆ (GWAS) | GWAS Catalog |
In Lists
- Memory loss [★★★: OpenTargets_get_associated_phenotypes_by_disease_efoId, core feature] - Cognitive decline [★★★: MedlinePlus_get_genetics_condition_by_name, diagnostic criterion] - Sleep disturbance [★☆☆: association studies, not diagnostic]
In Prose with Evidence Grades
The disease affects approximately 6.5 million Americans [★★★: CDC epidemiology data]. APOE ε4 increases risk 3-15 fold [★★★: PMID:8346443, replicated in 100+ cohorts]. A recent single-center study suggests microbiome involvement [★☆☆: PMID:xxxxx, N=50].
Per-Section Evidence Summary
Include at section end:
--- **Evidence Quality for Section 3 (Genetics)**: - Causal/Mendelian (T1): 5 genes - Replicated GWAS (T2): 23 loci - Single GWAS (T3): 45 associations - Mention/Predicted (T4): 12 ---
References Section
At the end of the report, include complete tool usage log:
## References ### Tools Used | # | Tool | Parameters | Section | Items Retrieved | |---|------|------------|---------|-----------------| | 1 | OSL_get_efo_id_by_disease_name | disease="Alzheimer disease" | Identity | 1 | | 2 | ols_get_efo_term | obo_id="EFO:0000249" | Identity | 1 | | 3 | OpenTargets_get_associated_targets_by_disease_efoId | efoId="EFO_0000249" | Genetics | 245 | | ... | ... | ... | ... | ... | ### Data Retrieved Summary - Total tools used: 45 - Total API calls: 78 - Sections completed: 10/10
Progressive Update Pattern
After researching EACH dimension, immediately update the report file:
# After each dimension's research completes:
# 1. Read current report
with open(report_file, 'r') as f:
report = f.read()
# 2. Replace placeholder with formatted content
report = report.replace(
"## 3. Genetic & Molecular Basis\n*Pending...*",
formatted_genetics_section
)
# 3. Write back immediately
with open(report_file, 'w') as f:
f.write(report)
# 4. Continue to next dimension
Final Report Quality Checklist
Before presenting to user, verify:
- • All 10 sections have content (or marked as "No data available")
- • Every data point has a source citation
- • Executive summary reflects key findings
- • References section lists all tools used
- • Tables are properly formatted
- • No placeholder text remains
Example Output Structure
For "Alzheimer's Disease" research, the final report should be 2000+ lines with:
- •Section 1: 5+ ontology IDs, 10+ synonyms, disease hierarchy
- •Section 2: 20+ phenotypes with HPO IDs, symptoms list
- •Section 3: 50+ genes with scores, 30+ GWAS associations, 100+ ClinVar variants
- •Section 4: 20+ drugs, 50+ clinical trials with details
- •Section 5: 10+ pathways, PPI network, expression data
- •Section 6: 100+ publications, citation analysis, institution list
- •Section 7: 15+ similar diseases with similarity scores
- •Section 8: (if cancer) variants, evidence items
- •Section 9: Pharmacological targets and interactions
- •Section 10: Drug warnings, adverse events
Total: Detailed report with 500+ individual data points, each with source citation.
Tool Reference
See TOOLS_REFERENCE.md for complete tool documentation. See EXAMPLES.md for sample reports.