ToolUniverse Tool Creator

Create new scientific tools for the ToolUniverse framework following established best practices.

•Critical Knowledge
•Core Concepts
•Implementation Guide
•Testing Strategy
•Common Patterns
•Troubleshooting
•Reference

Critical Knowledge

Top 5 Mistakes (90% of Failures)

•Missing default_config.py Entry - Tools silently won't load
•Fake test_examples - Tests fail, agents get bad examples
•Single-level Testing - Misses registration bugs
•Tool Names > 55 chars - Breaks MCP compatibility
•Raising Exceptions - Should return error dicts instead

Tool Creator vs SDK User

SDK User (Using)	Tool Creator (Building)
`tu.tools.ToolName()`	`@register_tool()` + JSON
Handle responses	Design schemas
One-level usage	Three-step registration

Core Concepts

Two-Stage Architecture

code

Stage 1: Tool Class              Stage 2: Wrappers (Auto-Generated)
Python Implementation            From JSON Configs
       ↓                                  ↓
@register_tool("MyTool")         MyAPI_list_items()
class MyTool(BaseTool):          MyAPI_search()
    def run(arguments):          MyAPI_get_details()

Key Points:

•One class handles multiple operations
•JSON defines individual tool wrappers
•Users call wrappers, which route to class
•Need BOTH for tools to work

Three-Step Registration

Step 1: Class Registration

python

@register_tool("MyAPITool")  # Decorator registers class
class MyAPITool(BaseTool):
    pass

Step 2: Config Registration ⚠️ MOST COMMONLY MISSED

python

# In src/tooluniverse/default_config.py
TOOLS_CONFIGS = {
    "my_category": os.path.join(current_dir, "data", "my_category_tools.json"),
}

Step 3: Wrapper Generation (Automatic)

bash

tu = ToolUniverse()
tu.load_tools()  # Auto-generates wrappers in tools/

Verification Script:

python

import sys
sys.path.insert(0, 'src')

# Step 1: Check class registered
from tooluniverse.tool_registry import get_tool_registry
import tooluniverse.your_tool_module
registry = get_tool_registry()
assert "YourToolClass" in registry, "❌ Step 1 FAILED"
print("✅ Step 1: Class registered")

# Step 2: Check config registered
from tooluniverse.default_config import TOOLS_CONFIGS
assert "your_category" in TOOLS_CONFIGS, "❌ Step 2 FAILED"
print("✅ Step 2: Config registered")

# Step 3: Check wrappers generated
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
assert hasattr(tu.tools, 'YourCategory_operation1'), "❌ Step 3 FAILED"
print("✅ Step 3: Wrappers generated")
print(f"✅ All steps complete!")

Standard Response Format

All tools must return:

json

{
  "status": "success" | "error",
  "data": {...},        // On success
  "error": "message"    // On failure
}

Why: Consistent error handling, composability, user expectations

Implementation Guide

File Structure

Required Files:

•src/tooluniverse/my_api_tool.py - Implementation
•src/tooluniverse/data/my_api_tools.json - Tool definitions
•tests/unit/test_my_api_tool.py - Tests
•examples/my_api_examples.py - Usage examples

Auto-Generated (don't create manually):

•src/tooluniverse/tools/MyAPI_*.py - Wrappers

Pattern 1: Multi-Operation Tool (Recommended)

Python Class:

python

from typing import Dict, Any
from tooluniverse.tool import BaseTool
from tooluniverse.tool_utils import register_tool
import requests

@register_tool("MyAPITool")
class MyAPITool(BaseTool):
    """Tool for MyAPI database."""
    
    BASE_URL = "https://api.example.com/v1"
    
    def __init__(self, tool_config):
        super().__init__(tool_config)
        self.parameter = tool_config.get("parameter", {})
        self.required = self.parameter.get("required", [])
    
    def run(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
        """Route to operation handler."""
        operation = arguments.get("operation")
        
        if not operation:
            return {"status": "error", "error": "Missing: operation"}
        
        if operation == "list_items":
            return self._list_items(arguments)
        elif operation == "search":
            return self._search(arguments)
        else:
            return {"status": "error", "error": f"Unknown: {operation}"}
    
    def _list_items(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
        """List items with pagination."""
        try:
            params = {}
            if "limit" in arguments:
                params["limit"] = arguments["limit"]
            
            response = requests.get(
                f"{self.BASE_URL}/items",
                params=params,
                timeout=30
            )
            response.raise_for_status()
            
            data = response.json()
            return {
                "status": "success",
                "data": data.get("items", []),
                "total": data.get("total", 0)
            }
        except requests.exceptions.Timeout:
            return {"status": "error", "error": "Timeout after 30s"}
        except requests.exceptions.HTTPError as e:
            return {"status": "error", "error": f"HTTP {e.response.status_code}"}
        except Exception as e:
            return {"status": "error", "error": str(e)}
    
    def _search(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
        """Search items by query."""
        query = arguments.get("query")
        if not query:
            return {"status": "error", "error": "Missing: query"}
        
        try:
            response = requests.get(
                f"{self.BASE_URL}/search",
                params={"q": query},
                timeout=30
            )
            response.raise_for_status()
            
            data = response.json()
            return {
                "status": "success",
                "results": data.get("results", []),
                "count": data.get("count", 0)
            }
        except requests.exceptions.RequestException as e:
            return {"status": "error", "error": f"API failed: {str(e)}"}

JSON Configuration:

json

[
  {
    "name": "MyAPI_list_items",
    "class": "MyAPITool",
    "description": "List items from database with pagination. Returns item IDs and names. Supports filtering by status and type. Example: limit=10 returns first 10 items.",
    "parameter": {
      "type": "object",
      "required": ["operation"],
      "properties": {
        "operation": {
          "const": "list_items",
          "description": "Operation type (fixed)"
        },
        "limit": {
          "type": "integer",
          "description": "Max results (1-100)",
          "minimum": 1,
          "maximum": 100
        }
      }
    },
    "return": {
      "type": "object",
      "properties": {
        "status": {"type": "string", "enum": ["success", "error"]},
        "data": {"type": "array"},
        "total": {"type": "integer"},
        "error": {"type": "string"}
      },
      "required": ["status"]
    },
    "test_examples": [
      {
        "operation": "list_items",
        "limit": 10
      }
    ]
  }
]

Pattern 2: Async Polling (Job-Based APIs)

python

import time

def _submit_job(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
    """Submit job and poll for results."""
    try:
        # Submit
        submit_response = requests.post(
            f"{self.BASE_URL}/jobs/submit",
            json={"data": arguments.get("data")},
            timeout=30
        )
        submit_response.raise_for_status()
        job_id = submit_response.json().get("job_id")
        
        # Poll
        for attempt in range(60):  # 2 min max
            status_response = requests.get(
                f"{self.BASE_URL}/jobs/{job_id}/status",
                timeout=30
            )
            status_response.raise_for_status()
            
            result = status_response.json()
            if result.get("status") == "completed":
                return {
                    "status": "success",
                    "data": result.get("results"),
                    "job_id": job_id
                }
            elif result.get("status") == "failed":
                return {
                    "status": "error",
                    "error": result.get("error"),
                    "job_id": job_id
                }
            
            time.sleep(2)  # Poll every 2s
        
        return {"status": "error", "error": "Timeout after 2 min"}
    except requests.exceptions.RequestException as e:
        return {"status": "error", "error": str(e)}

JSON Best Practices

Tool Naming (≤55 chars for MCP):

•Template: {API}_{action}_{target}
•✅ Good: FDA_get_drug_info (20 chars)
•❌ Bad: FDA_get_detailed_drug_information_with_history (55+ chars)

Description (150-250 chars, high-context):

json

{
  "description": "Search database for items. Returns up to 100 results with scores. Supports wildcards (* ?) and Boolean operators (AND, OR, NOT). Example: 'protein AND membrane' finds membrane proteins."
}

Include: What it returns, data source, use case, input format, example

test_examples (MUST be real):

json

{
  "test_examples": [
    {
      "operation": "search",
      "query": "protein",     // ✅ Real, common term
      "limit": 10
    }
  ]
}

❌ Don't use: "id": "XXXXX", "placeholder": "example_123" ✅ Do use: Real IDs from actual API documentation

Testing Strategy

Two-Level Testing (MANDATORY)

Level 1: Direct Class Testing

python

import json
from tooluniverse.your_tool_module import YourToolClass

def test_direct_class():
    """Test implementation logic."""
    with open("src/tooluniverse/data/your_tools.json") as f:
        tools = json.load(f)
        config = next(t for t in tools if t["name"] == "YourTool_operation1")
    
    tool = YourToolClass(config)
    result = tool.run({"operation": "operation1", "param": "value"})
    
    assert result["status"] == "success"
    assert "data" in result

Level 2: ToolUniverse Interface Testing

python

import pytest
from tooluniverse import ToolUniverse

class TestYourTools:
    @pytest.fixture
    def tu(self):
        tu = ToolUniverse()
        tu.load_tools()  # CRITICAL
        return tu
    
    def test_tools_load(self, tu):
        """Verify registration."""
        assert hasattr(tu.tools, 'YourTool_operation1')
    
    def test_execution(self, tu):
        """Test via ToolUniverse (how users call it)."""
        result = tu.tools.YourTool_operation1(**{
            "operation": "operation1",
            "param": "value"
        })
        assert result["status"] == "success"
    
    def test_error_handling(self, tu):
        """Test missing params."""
        result = tu.tools.YourTool_operation1(**{
            "operation": "operation1"
            # Missing required param
        })
        assert result["status"] == "error"

Level 3: Real API Testing

python

def test_real_api():
    """Verify actual API integration."""
    tu = ToolUniverse()
    tu.load_tools()
    
    result = tu.tools.YourTool_operation1(**{
        "operation": "operation1",
        "param": "real_value_from_docs"
    })
    
    if result["status"] == "success":
        assert "data" in result
        print("✅ Real API works")
    else:
        print(f"⚠️  API error (may be down): {result['error']}")

Why Both Levels:

•Level 1: Tests implementation, catches code bugs
•Level 2: Tests registration, catches config bugs
•Level 3: Tests integration, catches API issues

Common Patterns

Error Handling Checklist

✅ Always set timeout (30s recommended) ✅ Catch specific exceptions (Timeout, ConnectionError, HTTPError) ✅ Return error dicts, never raise in run() ✅ Include helpful context in error messages ✅ Handle JSON parsing errors ✅ Validate required parameters

Dependency Management

Check package size FIRST:

bash

curl -s https://pypi.org/pypi/PACKAGE/json | python3 -c "
import json, sys
data = json.load(sys.stdin)
print(f'Dependencies: {len(data[\"info\"][\"requires_dist\"] or [])}')
"

Classification:

•Core (<100MB, universal use) → [project.dependencies]
•Optional (>100MB or niche) → [project.optional-dependencies]

In code:

python

try:
    import optional_package
except ImportError:
    return {
        "status": "error",
        "error": "Install with: pip install optional_package"
    }

Pagination Pattern

python

def _list_items(self, arguments):
    params = {}
    if "page" in arguments:
        params["page"] = arguments["page"]
    if "limit" in arguments:
        params["limit"] = arguments["limit"]
    
    response = requests.get(url, params=params, timeout=30)
    data = response.json()
    
    return {
        "status": "success",
        "data": data.get("items", []),
        "page": data.get("page", 0),
        "total_pages": data.get("total_pages", 1),
        "total_items": data.get("total", 0)
    }

Troubleshooting

Tool Doesn't Load (90% of Issues)

Symptoms: Tool count doesn't increase, no error, AttributeError when calling

Cause: Missing Step 2 of registration (default_config.py)

Solution:

python

# Edit src/tooluniverse/default_config.py
TOOLS_CONFIGS = {
    # ... existing ...
    "your_category": os.path.join(current_dir, "data", "your_category_tools.json"),
}

Verify:

bash

grep "your_category" src/tooluniverse/default_config.py
ls src/tooluniverse/tools/YourCategory_*.py
python3 -c "from tooluniverse import ToolUniverse; tu = ToolUniverse(); tu.load_tools(); print(hasattr(tu.tools, 'YourCategory_op1'))"

Tests Fail with Real APIs

Mock vs Real Testing:

•Mocks test code structure
•Real calls test API integration
•Both needed for confidence

What Real Testing Catches:

•Response structure differences
•Parameter name mismatches
•Unexpected pagination
•Timeout issues
•Data type surprises

Reference

Complete Workflow

•Create Python class with @register_tool
•Create JSON config with realistic test_examples
•Add to default_config.py ← CRITICAL
•Generate wrappers: tu.load_tools()
•Test Level 1 (direct class)
•Test Level 2 (ToolUniverse interface)
•Test Level 3 (real API calls)
•Create examples file
•Verify all 3 registration steps
•Document in verification report

Quick Commands

bash

# Validate JSON
python3 -m json.tool src/tooluniverse/data/your_tools.json

# Check Python syntax
python3 -m py_compile src/tooluniverse/your_tool.py

# Verify registration
grep "your_category" src/tooluniverse/default_config.py

# Generate wrappers
PYTHONPATH=src python3 -m tooluniverse.generate_tools --force

# List wrappers
ls src/tooluniverse/tools/YourCategory_*.py

# Run tests
pytest tests/unit/test_your_tool.py -v

# Count tools
python3 << 'EOF'
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
print(f"Total: {len([t for t in dir(tu.tools) if 'YourCategory' in t])} tools")
EOF

Critical Reminders

⚠️ ALWAYS add to default_config.py (Step 2) ⚠️ NEVER raise exceptions in run() ⚠️ ALWAYS use real test_examples ⚠️ ALWAYS test both levels ⚠️ KEEP tool names ≤55 characters ⚠️ RETURN standard response format ⚠️ SET timeout on all HTTP requests ⚠️ VERIFY all 3 registration steps

Success Criteria

✅ All 3 registration steps verified ✅ Level 1 tests passing (direct class) ✅ Level 2 tests passing (ToolUniverse interface) ✅ Real API calls working (Level 3) ✅ Tool names ≤55 characters ✅ test_examples use real IDs ✅ Standard response format used ✅ Helpful error messages ✅ Examples file created ✅ No raised exceptions in run()

When all criteria met → Production Ready 🎉

devtu-create-tool

ToolUniverse Tool Creator

Table of Contents

Critical Knowledge

Top 5 Mistakes (90% of Failures)

Tool Creator vs SDK User

Core Concepts

Two-Stage Architecture

Three-Step Registration

Standard Response Format

Implementation Guide

File Structure

Pattern 1: Multi-Operation Tool (Recommended)

Pattern 2: Async Polling (Job-Based APIs)

JSON Best Practices

Testing Strategy

Two-Level Testing (MANDATORY)

Common Patterns

Error Handling Checklist

Dependency Management

Pagination Pattern

Troubleshooting

Tool Doesn't Load (90% of Issues)

Tests Fail with Real APIs

Reference

Complete Workflow

Quick Commands

Critical Reminders

Success Criteria