Optimizing ToolUniverse Skills
Best practices for creating high-quality ToolUniverse research skills that produce detailed, evidence-graded reports with proper source attribution.
When to Use This Skill
Apply when:
- •Creating new ToolUniverse research skills
- •Reviewing/improving existing skills
- •User complains about missing details, noisy results, or unclear reports
- •Skill produces process-heavy instead of content-heavy output
- •Tools are failing silently or returning empty results
Core Optimization Principles
1. Tool Interface Verification (Pre-flight Check)
Problem: Tool APIs change parameter names over time, or skills are written with incorrect parameter assumptions. This causes silent failures - tools return empty results without errors.
Solution: Verify tool parameters before calling unfamiliar tools:
# Always check tool params to prevent silent failures tool_info = tu.tools.get_tool_info(tool_name="Reactome_map_uniprot_to_pathways") # Reveals: takes `id` not `uniprot_id`
Maintain a known corrections table in skills that use many tools:
| Tool | WRONG Parameter | CORRECT Parameter |
|---|---|---|
Reactome_map_uniprot_to_pathways | uniprot_id | id |
ensembl_get_xrefs | gene_id | id |
GTEx_get_median_gene_expression | gencode_id only | gencode_id + operation="median" |
OpenTargets_* | ensemblID | ensemblId (camelCase) |
Rule: Before calling any tool for the first time in a skill, confirm params via get_tool_info() once per tool family, or maintain a vetted param map in the skill.
Why this matters: Retry logic won't help if you're calling a tool with wrong parameter names - it will consistently return empty. This is different from API flakiness.
2. Foundation Data Layer (Path 0)
Problem: Skills query specialized tools for each section independently, missing data that a comprehensive aggregator already has. Results are inconsistent when specialized tools fail.
Solution: Identify if your domain has a comprehensive aggregator and query it FIRST before specialized tools.
Examples by domain:
| Domain | Foundation Source | What It Provides |
|---|---|---|
| Drug targets | Open Targets | Diseases, tractability, safety, drugs, GO, publications, mouse models |
| Chemicals | PubChem | Properties, bioactivity, patents, literature |
| Diseases | Open Targets / OMIM | Genes, drugs, phenotypes, literature |
| Genes | MyGene / Ensembl | Annotations, cross-refs, GO, pathways |
Pattern:
## Workflow Phase 0: Foundation Data (aggregator query) Phase 1: Disambiguation (ID resolution, collision detection) Phase 2: Specialized Queries (fill gaps from Phase 0) Phase 3: Report Synthesis
Why this works: The aggregator provides reliable baseline data across multiple sections. Specialized tools then add depth or fill gaps, rather than being the sole source.
3. Versioned Identifier Handling
Problem: Some APIs require versioned identifiers (e.g., GTEx needs ENSG00000123456.12), while others reject them. Skills fail silently when using the wrong format.
Solution: During ID resolution, capture BOTH versioned and unversioned forms:
ids = {
'ensembl': 'ENSG00000123456', # Unversioned (most APIs)
'ensembl_versioned': 'ENSG00000123456.12' # Versioned (GTEx, some others)
}
# Get version from Ensembl lookup
gene_info = tu.tools.ensembl_lookup_gene(id=ensembl_id, species="human")
if gene_info and gene_info.get('version'):
ids['ensembl_versioned'] = f"{ensembl_id}.{gene_info['version']}"
Fallback strategy:
- •Try unversioned first (more portable)
- •If empty, try versioned
- •Document which format worked
Common versioned ID APIs: GTEx, GENCODE, some Ensembl endpoints
4. Disambiguation Before Research
Problem: Skills that jump straight to literature search often miss target details or retrieve irrelevant papers due to naming collisions.
Solution: Add a disambiguation phase before any literature search:
## Phase 1: Target Disambiguation (Default ON) ### 1.1 Resolve Official Identifiers - UniProt accession (canonical protein) - Ensembl gene ID + version (for expression data) - NCBI Gene ID (for literature) - ChEMBL target ID (for drug data) ### 1.2 Gather Synonyms and Aliases - All known gene symbols - Protein name variants - Historical names ### 1.3 Detect Naming Collisions - Search "[SYMBOL]"[Title] - review top 20 results - If >20% off-topic → identify collision terms - Build negative filter: NOT [collision1] NOT [collision2] ### 1.4 Get Baseline Profile (from annotation DBs, not literature) - Protein domains (InterPro) - Subcellular location (HPA) - Tissue expression (GTEx) - GO terms and pathways
Why this works: Annotation databases provide reliable baseline data even when literature is sparse or noisy.
5. Report-Only Output (Hide Search Process)
Problem: Users don't want to see "searched 8 databases, found 1,247 papers, deduplicated to 892..."
Solution: Output structure:
| File | Content | When |
|---|---|---|
[topic]_report.md | Narrative findings only | Always (default) |
[topic]_bibliography.json | Full deduplicated papers | Always |
methods_appendix.md | Search methodology | Only if requested |
In the report:
- •✅ DO: "The literature reveals three main therapeutic approaches..."
- •❌ DON'T: "I searched PubMed, OpenAlex, and EuropePMC, finding 342 papers..."
6. Evidence Grading
Problem: A review article mention is treated the same as a mechanistic study with direct evidence.
Solution: Apply evidence tiers to every claim:
| Tier | Symbol | Criteria |
|---|---|---|
| T1 | ★★★ | Mechanistic study with direct evidence |
| T2 | ★★☆ | Functional study (knockdown, overexpression) |
| T3 | ★☆☆ | Association (screen hit, GWAS, correlation) |
| T4 | ☆☆☆ | Mention (review, text-mined, peripheral) |
In report:
ATP6V1A drives lysosomal acidification [★★★: PMID:12345678] and has been implicated in cancer progression [★☆☆: PMID:23456789, TCGA expression data].
Required locations for evidence grades:
- •Executive Summary - key disease claims
- •Disease Associations - every disease link
- •Key Papers table - evidence tier column
- •Recommendations - reference evidence quality
Per-section summary:
### Theme: Lysosomal Function (47 papers) **Evidence Quality**: Strong (32 mechanistic, 11 functional, 4 association)
7. Quantified Completeness (Not Just Categorical)
Problem: "Include PPIs" is aspirational; reports pass the checklist but are data-thin.
Solution: Define numeric minimums for each section:
| Section | Minimum Data | If Not Met |
|---|---|---|
| PPIs | ≥20 interactors | Explain why fewer + which tools failed |
| Expression | Top 10 tissues with values | Note "limited data" with specific gaps |
| Disease | Top 10 associations with scores | Note if fewer available |
| Variants | All 4 constraint scores (pLI, LOEUF, missense Z, pRec) | Note which unavailable |
| Druggability | All modalities assessed | "No drugs/probes" is valid data |
| Literature | Total + 5-year trend + 3-5 key papers | Note if sparse (<50 papers) |
Why this matters: Quantified minimums make completeness auditing objective and mechanical, not subjective.
8. Mandatory Completeness Checklist
Problem: Reports have inconsistent sections; some topics get skipped entirely.
Solution: Define mandatory sections that MUST exist, even if populated with "Limited evidence" or "Unknown":
## Completeness Checklist (ALL Required) ### Identity & Context - [ ] Official identifiers resolved (all 6 types) - [ ] Synonyms/aliases documented - [ ] Naming collisions handled (or "none detected") ### Biology - [ ] Protein architecture (or "N/A for non-protein") - [ ] Subcellular localization - [ ] Expression profile (≥10 tissues with values) - [ ] Pathway involvement (≥10 pathways) ### Mechanism - [ ] Core function with evidence grades - [ ] Model organism data (or "none found") - [ ] Key assays described ### Disease & Clinical - [ ] Genetic variants (SNVs and CNVs separated) - [ ] Constraint scores (all 4, with interpretations) - [ ] Disease links with evidence grades (≥10 or "limited") ### Druggability - [ ] Tractability for all modalities - [ ] Known drugs (or "none") - [ ] Chemical probes (or "none available") - [ ] Clinical pipeline (or "none") ### Synthesis (CRITICAL) - [ ] Research themes (≥3 papers each, or "limited") - [ ] Open questions/gaps - [ ] Biological model synthesized - [ ] Testable hypotheses (≥3)
9. Aggregated Data Gaps Section
Problem: "No data" notes scattered across 14 sections; users can't quickly see what's missing.
Solution: Add a dedicated Data Gaps & Limitations section that consolidates all gaps:
## 15. Data Gaps & Limitations | Section | Expected Data | Actual | Reason | Alternative Source | |---------|---------------|--------|--------|-------------------| | 6. PPIs | ≥20 interactors | 8 | Novel target, limited studies | Literature review needed | | 7. Expression | GTEx TPM | None | Versioned ID not recognized | See HPA data | | 9. Probes | Chemical probes | None | No validated probes exist | Consider tool compound dev | **Recommendations for Data Gaps**: 1. For PPIs: Query BioGRID with broader parameters; check yeast-2-hybrid studies 2. For Expression: Query GEO directly for tissue-specific datasets
Why this matters: Users can quickly assess data quality and know where to look for more information.
10. Query Strategy Optimization
Problem: Simple keyword searches retrieve too much noise or miss relevant papers.
Solution: Three-step collision-aware query strategy:
## Query Strategy ### Step 1: High-Precision Seeds Build a mechanistic core set (15-30 papers): - "[GENE_SYMBOL]"[Title] AND mechanism - "[FULL_PROTEIN_NAME]"[Title] - "UniProt:ACCESSION" ### Step 2: Citation Network Expansion From seeds, expand via citations: - Forward: PubMed_get_cited_by, EuropePMC_get_citations - Related: PubMed_get_related - Backward: EuropePMC_get_references ### Step 3: Collision-Filtered Broad Apply negative filters for known collisions: - "TRAG" AND immune NOT plasmid NOT conjugation - "JAK" AND kinase NOT "just another"
Citation-first for sparse targets: When keyword search returns <30 papers, prioritize citation expansion from the few good seeds.
11. Tool Failure Handling
Problem: NCBI elink and other APIs can be flaky; skills fail silently.
Solution: Automatic retry with fallback chains:
## Failure Handling ### Retry Protocol Attempt 1 → fails → wait 2s → Attempt 2 → fails → wait 5s → Fallback ### Fallback Chains | Primary | Fallback 1 | Fallback 2 | |---------|------------|------------| | PubMed_get_cited_by | EuropePMC_get_citations | OpenAlex citations | | PubMed_get_related | SemanticScholar | Keyword search | | GTEx_* | HPA_* | Note as unavailable | | Unpaywall | EuropePMC OA flag | OpenAlex is_oa | | ChEMBL_get_target_activities | GtoPdb_get_target_ligands | OpenTargets drugs | | intact_get_interactions | STRING_get_protein_interactions | OpenTargets interactions | ### Document Failures In report: "Expression data unavailable (GTEx API timeout after 3 attempts)"
Rule: NEVER silently skip failed tools. Always document in the Data Gaps section.
12. Scalable Output Structure
Problem: Reports with 500+ papers become unreadable; users can't find what they need.
Solution: Separate narrative from data:
Narrative report (~20-50 pages max):
- •Executive summary
- •Key findings by theme
- •Top 20-50 papers highlighted
- •Conclusions and hypotheses
Bibliography files (unlimited):
- •
[topic]_bibliography.json- Full structured data - •
[topic]_bibliography.csv- Tabular for filtering
JSON structure:
{
"pmid": "12345678",
"doi": "10.1038/xxx",
"title": "...",
"evidence_tier": "T1",
"themes": ["lysosomal_function", "autophagy"],
"is_core_seed": true,
"oa_status": "gold"
}
13. Synthesis Sections
Problem: Reports describe what was found but don't synthesize into actionable insights.
Solution: Require synthesis sections:
## Required Synthesis Sections ### Biological Model (3-5 paragraphs) Integrate all evidence into a coherent model: - What does the target do? - How does it connect to disease? - What's the key uncertainty? ### Testable Hypotheses (≥3) | # | Hypothesis | Perturbation | Readout | Expected | |---|------------|--------------|---------|----------| | 1 | [Hypothesis] | [Experiment] | [Measure] | [Prediction] | ### Suggested Experiments Brief description of how to test each hypothesis.
Skill Review Checklist
When reviewing a ToolUniverse skill, check:
Tool Contract
- • Tool parameters verified via
get_tool_info()or documented corrections - • Versioned vs unversioned ID handling specified
- • Foundation data source identified (if available for domain)
Report Quality
- • Report focuses on content, not search process
- • Methodology in separate appendix (optional)
- • Evidence grades applied to claims (T1-T4)
- • Source attribution on every fact
- • Sections exist even if "limited evidence"
Query Strategy
- • Disambiguation phase before search
- • Collision detection for ambiguous names
- • High-precision seeds before broad search
- • Citation expansion option for sparse topics
- • Negative filters documented
Tool Usage
- • Annotation tools used (not just literature)
- • Fallback chains defined
- • Failure handling with retry
- • OA handling (full or best-effort)
Completeness
- • Quantified minimums defined per section
- • Completeness checklist with checkboxes
- • Data Gaps section aggregates all missing data
- • "Negative results" explicitly documented ("no probes" not blank)
Output Structure
- • Main report is narrative-focused
- • Bibliography in separate JSON/CSV
- • Synthesis sections required
User Experience
- • Progress updates are brief
- • No raw tool outputs shown
- • Final report is the deliverable
Common Anti-Patterns to Fix
1. "Search Log" Reports
Bad: "Round 1: Searched PubMed (234 papers), OpenAlex (456 papers)..." Fix: Keep methodology internal; report findings only
2. Missing Disambiguation
Bad: Search "JAK" and get kinase + "just another kinase" papers mixed Fix: Add collision detection; build negative filters
3. No Evidence Grading
Bad: "Multiple studies show..." (which studies? what quality?) Fix: Apply T1-T4 grades; label each claim
4. Empty Sections Omitted
Bad: Skip "Pathogen Involvement" because nothing found Fix: Include section with "None identified in literature search"
5. No Synthesis
Bad: Long list of papers organized by theme Fix: Add biological model + testable hypotheses
6. Monolithic Bibliography
Bad: 200 papers embedded in report narrative Fix: Top 20-50 in report; full list in JSON/CSV
7. Silent Failures
Bad: "Expression data: [blank]" (tool failed, user doesn't know) Fix: "Expression data unavailable (API timeout); see HPA directly"
8. Wrong Tool Parameters (NEW)
Bad: Reactome_map_uniprot_to_pathways(uniprot_id=...) returns empty
Fix: Verify params via get_tool_info(); use correct param id
9. Missing Versioned IDs (NEW)
Bad: GTEx returns empty for ENSG00000123456
Fix: Try versioned ID ENSG00000123456.12; document which worked
10. No Foundation Layer (NEW)
Bad: Query 15 specialized tools independently, miss data when some fail Fix: Query comprehensive aggregator (e.g., Open Targets) first
11. Scattered "No Data" Notes (NEW)
Bad: "No data" in 5 different sections; user doesn't know overall gaps Fix: Aggregate all gaps in dedicated Data Gaps section with recommendations
12. Aspirational Completeness (NEW)
Bad: "Include PPIs" ✓ (but only 3 interactors listed) Fix: "≥20 PPIs OR explanation why fewer"
Template: Optimized Skill Structure
--- name: [domain]-research description: [What it does]. Creates detailed report with evidence grading and mandatory completeness. [When to use triggers]. --- # [Domain] Research Strategy ## When to Use [Trigger scenarios] ## Workflow Phase -1: Tool Verification → Phase 0: Foundation Data → Phase 1: Disambiguate → Phase 2: Search → Phase 3: Report ## Phase -1: Tool Verification [Parameter corrections table for tools used in this skill] ## Phase 0: Foundation Data [Comprehensive aggregator query - e.g., Open Targets for targets] ## Phase 1: Disambiguation (Default ON) [ID resolution (versioned + unversioned), collision detection, baseline profile] ## Phase 2: Specialized Queries (Internal) [Query strategy with collision filters, citation expansion, tool fallbacks] ## Phase 3: Report Synthesis [Progressive writing, evidence grading, mandatory sections] ## Output Files - `[topic]_report.md` (narrative, always) - `[topic]_bibliography.json` (data, always) - `methods_appendix.md` (only if requested) ## Quantified Minimums [Specific numbers per section - e.g., ≥20 PPIs, top 10 tissues] ## Completeness Checklist [ALL required sections with checkboxes] ## Data Gaps Section [Template for aggregating missing data with recommendations] ## Evidence Grading [T1-T4 definitions with required locations] ## Tool Reference [Tools by category with fallback chains and parameter notes]
Quick Fixes for Common Complaints
| User Complaint | Root Cause | Fix |
|---|---|---|
| "Report is too short" | Missing annotation data | Add Phase 1 disambiguation + Phase 0 foundation |
| "Too much noise" | No collision filtering | Add negative query filters |
| "Can't tell what's important" | No evidence grading | Add T1-T4 tiers |
| "Missing sections" | No completeness checklist | Add mandatory sections with minimums |
| "Too long/unreadable" | Monolithic output | Separate narrative from JSON |
| "Just a list of papers" | No synthesis | Add biological model + hypotheses |
| "Shows search process" | Wrong output focus | Report-only; methodology in appendix |
| "Tool failed, no data" | No fallback handling | Add retry + fallback chains |
| "Empty results, no error" | Wrong tool parameters | Add Phase -1 param verification |
| "GTEx returns nothing" | Versioned ID needed | Try ENSG*.version format |
| "Data seems incomplete" | No foundation layer | Add Phase 0 with aggregator |
| "Can't tell what's missing" | Scattered gaps | Add Data Gaps section |
Summary
Seven pillars of optimized ToolUniverse skills:
- •Verify tool contracts - Check params via
get_tool_info(); maintain corrections table - •Foundation first - Query comprehensive aggregators before specialized tools
- •Disambiguate carefully - Resolve IDs (versioned + unversioned), detect collisions, get baseline from annotation DBs
- •Grade evidence - T1-T4 tiers on all claims; summarize quality per section
- •Require quantified completeness - Numeric minimums, not just "include X"
- •Report content, not process - Methodology in appendix only if asked; aggregate gaps in one section
- •Synthesize - Biological models and testable hypotheses, not just paper lists
Apply these principles to any ToolUniverse research skill for better user experience and actionable output.