ToolUniverse Tool Description Optimization
Optimize tool descriptions in ToolUniverse JSON configuration files to ensure they are clear, complete, and user-friendly.
When to Apply This Skill
Use when:
- •Reviewing newly created tool descriptions
- •User asks "are these tools easy to understand?"
- •Improving existing tool documentation
- •Adding new tools to ToolUniverse
- •User mentions tool usability, clarity, or documentation
Quick Optimization Checklist
Tool Description Review: - [ ] Prerequisites stated (packages, API keys, accounts) - [ ] Critical abbreviations expanded on first use - [ ] Required vs optional parameters clear - [ ] Mutually exclusive options numbered/labeled - [ ] Parameter guidance includes trade-offs - [ ] Filter syntax shows available fields - [ ] File size warnings where relevant - [ ] Examples show realistic usage
Critical Improvements (Fix Immediately)
1. Clarify Required Input Requirements
Problem: Users don't know if they need ONE input or ALL inputs.
Fix: Use "Required: Provide ONE input type" for mutually exclusive options.
// Before "description": "Process BED regions, motifs, or gene lists..." // After "description": "Process genomic data. **Required: Provide ONE input type** - (1) BED regions, (2) DNA motif, or (3) gene list. Analyzes..."
Number the options and use bold for "Required".
2. Add Prerequisites to First Tool
Problem: Users don't know what to install/configure before use.
Fix: Add prerequisites note to first tool in each family.
"description": "Query single-cell data. Prerequisites: Requires 'package-name' (install: pip install tooluniverse[extra]). Returns..."
Include:
- •Package installation command
- •API key requirements
- •Account creation instructions
3. Expand Critical Abbreviations
Problem: New users don't understand technical terms.
Fix: Expand on first use with format: "Abbreviation (Full Name)".
Common abbreviations to expand:
- •H5AD → HDF5-based AnnData
- •RPM → Reads Per Million
- •TSS → Transcription Start Site
- •TAD → Topologically Associating Domain
- •DRS → Data Repository Service
- •API names (MACS2, IUPAC, etc.)
// Before "description": "Download H5AD files..." // After "description": "Download H5AD (HDF5-based AnnData) files..."
High-Priority Improvements
4. Enhance Filter Parameter Descriptions
Problem: Users don't know what fields are available or what syntax to use.
Fix: List operators, common fields, and provide multiple examples.
"parameter_name": {
"type": "string",
"description": "Filter using SQL-like syntax. Format: 'field == \"value\"'. Operators: ==, !=, in, <, >, <=, >=. Combine with 'and'/'or'. Common fields: tissue, cell_type, disease, assay, sex, ethnicity. Examples: 'tissue == \"lung\"', 'disease == \"COVID-19\" and tissue == \"lung\"', 'cell_type in [\"T cell\", \"B cell\"]'."
}
Include:
- •Syntax format
- •Available operators
- •List of 5-10 common fields
- •2-3 diverse examples
5. Improve Parameter Guidance
Problem: Users don't know which value to choose or what trade-offs exist.
Fix: Explain what each value means and provide recommendations.
// Before "threshold": "Q-value threshold (05=1e-5, 10=1e-10, 20=1e-20)" // After "threshold": "Peak calling stringency. '05'=1e-5 (permissive, more peaks, broad features), '10'=1e-10 (moderate, balanced), '20'=1e-20 (strict, high confidence, narrow peaks). Default '05' suitable for most analyses. Higher values = fewer but more confident peaks."
For each parameter option, explain:
- •What it means practically
- •When to use it
- •Trade-offs involved
- •Recommended default
6. Number Mutually Exclusive Options
Problem: Users provide multiple options when only one is allowed.
Fix: Label options as "Option 1", "Option 2", etc.
"bed_data": {
"description": "**Option 1**: BED format regions (tab-separated: chr, start, end). Example: 'chr1\\t1000\\t2000'."
},
"motif": {
"description": "**Option 2**: DNA sequence motif in IUPAC notation. Use: A/T/G/C, W=A|T, S=G|C. Example: 'CANNTG'."
},
"gene_list": {
"description": "**Option 3**: Gene symbols as array. Example: ['TP53', 'MDM2']."
}
Medium-Priority Improvements
7. Add File Size Warnings
For tools that download or return large files:
"description": "Download contact matrices. Note: Files can be large (GBs), check file_size in metadata before downloading. Returns..."
8. Clarify Web Form vs API Results
When tool returns submission URL instead of direct results:
"description": "Perform enrichment analysis. Note: Returns submission URL (web form-based analysis). Analyzes..."
9. Explain File Type Differences
For tools with multiple format options:
"file_type": "File format. Common types: 'cooler' (multi-resolution contact matrices), 'pairs' (aligned read pairs), 'hic' (Juicer format), 'mcool' (multi-resolution cooler)."
Description Structure Template
{
"name": "Tool_operation_name",
"type": "ToolClassName",
"description": "[Action verb] to [purpose]. [Prerequisites if first tool]. [Key data/features]. [Required inputs if mutually exclusive]. [Note about limitations/requirements]. Use for: [use case 1], [use case 2], [use case 3].",
"parameter": {
"properties": {
"param_name": {
"type": "string",
"description": "[What it does]. [Format/syntax if applicable]. [Options with trade-offs]. [Examples]. [Recommendation if applicable]."
}
}
}
}
Description Quality Checklist
Clarity Checks
- • Purpose clear in first sentence
- • Technical terms expanded
- • Prerequisites stated upfront
- • Examples show realistic usage
- • "Use for:" section lists 3-5 concrete use cases
Completeness Checks
- • Required inputs clearly marked
- • Parameter choices explained
- • Limitations noted (file size, web form, etc.)
- • Available fields listed for filters
- • Default values recommended
Usability Checks
- • New users can understand without external docs
- • Users know what to provide
- • Users can make informed parameter choices
- • Error prevention (mutually exclusive options labeled)
Testing Description Quality
To verify description quality, ask:
- •
Can a new user understand what the tool does?
- •Read only the description (no docs)
- •Should be clear within 30 seconds
- •
Can a user provide correct inputs on first try?
- •Required inputs obvious
- •Format/syntax clear
- •Mutually exclusive options labeled
- •
Can a user choose appropriate parameters?
- •Trade-offs explained
- •Recommendations provided
- •Defaults justified
- •
Are prerequisites obvious?
- •Installation instructions
- •API keys/accounts
- •File size warnings
Common Patterns by Tool Type
API Query Tools
"description": "Query [data type] from [source]. [Prerequisites]. Filter by [criteria]. Returns [output]. [Data scale]. Use for: [discovery], [analysis], [specific research tasks]."
Key elements:
- •What you're querying
- •How to filter
- •What you get back
- •Scale of data
- •Prerequisites
Data Download Tools
"description": "Download [file types] from [source]. [Format details]. [File size warning]. [Authentication requirement]. Use for: [offline analysis], [custom processing], [integration]."
Key elements:
- •File formats available
- •Size warning
- •Authentication needs
- •What's in the files
Enrichment/Analysis Tools
"description": "Analyze [input type] to find [results]. **Required: Provide ONE input type** - (1) [option], (2) [option], (3) [option]. Compares against [database/background]. [Result format]. Use for: [identifying], [discovering], [predicting]."
Key elements:
- •Input requirements clear
- •Options numbered
- •What gets compared
- •What you learn
Validation Commands
After updating descriptions, validate JSON syntax:
# Validate all tool JSONs python3 -m json.tool src/tooluniverse/data/your_tools.json > /dev/null && echo "✓ Valid" # Check all tools in category for f in src/tooluniverse/data/*_tools.json; do python3 -m json.tool "$f" > /dev/null && echo "✓ $f valid" || echo "✗ $f invalid" done
Example: Before and After
Before (Unclear):
{
"name": "Tool_enrichment",
"description": "Perform enrichment with tool to find factors.",
"parameter": {
"properties": {
"bed": {"description": "BED data"},
"motif": {"description": "Motif"},
"genes": {"description": "Genes"},
"threshold": {"description": "Threshold value"}
}
}
}
After (Clear):
{
"name": "Tool_enrichment_analysis",
"description": "Identify transcription factors enriched in your data. **Required: Provide ONE input type** - (1) BED genomic regions, (2) DNA sequence motif (IUPAC notation), or (3) gene symbol list. Compares against 400,000+ ChIP-seq experiments. Returns ranked proteins with enrichment scores. Note: Returns submission URL (web-based analysis). Use for: identifying regulators of regions, finding proteins bound to motifs, discovering transcription factors regulating genes.",
"parameter": {
"properties": {
"bed_data": {
"description": "**Option 1**: BED format regions (tab-separated: chr, start, end). For finding proteins bound to genomic regions. Example: 'chr1\\t1000\\t2000'."
},
"motif": {
"description": "**Option 2**: DNA motif in IUPAC notation (A/T/G/C, W=A|T, S=G|C, M=A|C, K=G|T, R=A|G, Y=C|T). Example: 'CANNTG' (E-box)."
},
"gene_list": {
"description": "**Option 3**: Gene symbols as array or single gene. Example: ['TP53', 'MDM2', 'CDKN1A']."
},
"threshold": {
"description": "Peak stringency. '05'=1e-5 (permissive, more peaks), '10'=1e-10 (moderate), '20'=1e-20 (strict, high confidence). Default '05' suitable for most analyses."
}
}
}
}
Summary
Priority order for optimization:
- •
Critical (fix immediately):
- •Clarify required inputs
- •Add prerequisites
- •Expand abbreviations
- •
High (fix soon):
- •Enhance filter descriptions
- •Improve parameter guidance
- •Number mutually exclusive options
- •
Medium (nice to have):
- •Add file size warnings
- •Clarify web form vs API
- •Explain file type differences
Expected impact: 50-75% reduction in user errors, 50-67% faster time to first successful use.