AgentSkillsCN

devtu-create-tool

为ToolUniverse框架构建全新的科学工具,确保其结构合理、验证严谨、测试充分。当用户需要向ToolUniverse添加新工具、引入新的API集成、为科学数据库或服务打造工具封装层、拓展ToolUniverse的功能边界,或遵循ToolUniverse的贡献指南时,本技能将助您事半功倍。它支持工具类的创建、JSON配置的编写、校验机制的落实、错误处理的优化,以及完备的测试用例设计。

SKILL.md
--- frontmatter
name: devtu-create-tool
description: Create new scientific tools for ToolUniverse framework with proper structure, validation, and testing. Use when users need to add tools to ToolUniverse, implement new API integrations, create tool wrappers for scientific databases/services, expand ToolUniverse capabilities, or follow ToolUniverse contribution guidelines. Supports creating tool classes, JSON configurations, validation, error handling, and test examples.

ToolUniverse Tool Creator

Create new scientific tools for the ToolUniverse framework following established best practices.


Table of Contents

  1. Critical Knowledge
  2. Core Concepts
  3. Implementation Guide
  4. Testing Strategy
  5. Common Patterns
  6. Troubleshooting
  7. Reference

Critical Knowledge

Top 5 Mistakes (90% of Failures)

  1. Missing default_config.py Entry - Tools silently won't load
  2. Fake test_examples - Tests fail, agents get bad examples
  3. Single-level Testing - Misses registration bugs
  4. Tool Names > 55 chars - Breaks MCP compatibility
  5. Raising Exceptions - Should return error dicts instead

Tool Creator vs SDK User

SDK User (Using)Tool Creator (Building)
tu.tools.ToolName()@register_tool() + JSON
Handle responsesDesign schemas
One-level usageThree-step registration

Core Concepts

Two-Stage Architecture

code
Stage 1: Tool Class              Stage 2: Wrappers (Auto-Generated)
Python Implementation            From JSON Configs
       ↓                                  ↓
@register_tool("MyTool")         MyAPI_list_items()
class MyTool(BaseTool):          MyAPI_search()
    def run(arguments):          MyAPI_get_details()

Key Points:

  • One class handles multiple operations
  • JSON defines individual tool wrappers
  • Users call wrappers, which route to class
  • Need BOTH for tools to work

Three-Step Registration

Step 1: Class Registration

python
@register_tool("MyAPITool")  # Decorator registers class
class MyAPITool(BaseTool):
    pass

Step 2: Config Registration ⚠️ MOST COMMONLY MISSED

python
# In src/tooluniverse/default_config.py
TOOLS_CONFIGS = {
    "my_category": os.path.join(current_dir, "data", "my_category_tools.json"),
}

Step 3: Wrapper Generation (Automatic)

bash
tu = ToolUniverse()
tu.load_tools()  # Auto-generates wrappers in tools/

Verification Script:

python
import sys
sys.path.insert(0, 'src')

# Step 1: Check class registered
from tooluniverse.tool_registry import get_tool_registry
import tooluniverse.your_tool_module
registry = get_tool_registry()
assert "YourToolClass" in registry, "❌ Step 1 FAILED"
print("✅ Step 1: Class registered")

# Step 2: Check config registered
from tooluniverse.default_config import TOOLS_CONFIGS
assert "your_category" in TOOLS_CONFIGS, "❌ Step 2 FAILED"
print("✅ Step 2: Config registered")

# Step 3: Check wrappers generated
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
assert hasattr(tu.tools, 'YourCategory_operation1'), "❌ Step 3 FAILED"
print("✅ Step 3: Wrappers generated")
print(f"✅ All steps complete!")

Standard Response Format

All tools must return:

json
{
  "status": "success" | "error",
  "data": {...},        // On success
  "error": "message"    // On failure
}

Why: Consistent error handling, composability, user expectations


Implementation Guide

File Structure

Required Files:

  • src/tooluniverse/my_api_tool.py - Implementation
  • src/tooluniverse/data/my_api_tools.json - Tool definitions
  • tests/unit/test_my_api_tool.py - Tests
  • examples/my_api_examples.py - Usage examples

Auto-Generated (don't create manually):

  • src/tooluniverse/tools/MyAPI_*.py - Wrappers

Pattern 1: Multi-Operation Tool (Recommended)

Python Class:

python
from typing import Dict, Any
from tooluniverse.tool import BaseTool
from tooluniverse.tool_utils import register_tool
import requests

@register_tool("MyAPITool")
class MyAPITool(BaseTool):
    """Tool for MyAPI database."""
    
    BASE_URL = "https://api.example.com/v1"
    
    def __init__(self, tool_config):
        super().__init__(tool_config)
        self.parameter = tool_config.get("parameter", {})
        self.required = self.parameter.get("required", [])
    
    def run(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
        """Route to operation handler."""
        operation = arguments.get("operation")
        
        if not operation:
            return {"status": "error", "error": "Missing: operation"}
        
        if operation == "list_items":
            return self._list_items(arguments)
        elif operation == "search":
            return self._search(arguments)
        else:
            return {"status": "error", "error": f"Unknown: {operation}"}
    
    def _list_items(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
        """List items with pagination."""
        try:
            params = {}
            if "limit" in arguments:
                params["limit"] = arguments["limit"]
            
            response = requests.get(
                f"{self.BASE_URL}/items",
                params=params,
                timeout=30
            )
            response.raise_for_status()
            
            data = response.json()
            return {
                "status": "success",
                "data": data.get("items", []),
                "total": data.get("total", 0)
            }
        except requests.exceptions.Timeout:
            return {"status": "error", "error": "Timeout after 30s"}
        except requests.exceptions.HTTPError as e:
            return {"status": "error", "error": f"HTTP {e.response.status_code}"}
        except Exception as e:
            return {"status": "error", "error": str(e)}
    
    def _search(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
        """Search items by query."""
        query = arguments.get("query")
        if not query:
            return {"status": "error", "error": "Missing: query"}
        
        try:
            response = requests.get(
                f"{self.BASE_URL}/search",
                params={"q": query},
                timeout=30
            )
            response.raise_for_status()
            
            data = response.json()
            return {
                "status": "success",
                "results": data.get("results", []),
                "count": data.get("count", 0)
            }
        except requests.exceptions.RequestException as e:
            return {"status": "error", "error": f"API failed: {str(e)}"}

JSON Configuration:

json
[
  {
    "name": "MyAPI_list_items",
    "class": "MyAPITool",
    "description": "List items from database with pagination. Returns item IDs and names. Supports filtering by status and type. Example: limit=10 returns first 10 items.",
    "parameter": {
      "type": "object",
      "required": ["operation"],
      "properties": {
        "operation": {
          "const": "list_items",
          "description": "Operation type (fixed)"
        },
        "limit": {
          "type": "integer",
          "description": "Max results (1-100)",
          "minimum": 1,
          "maximum": 100
        }
      }
    },
    "return": {
      "type": "object",
      "properties": {
        "status": {"type": "string", "enum": ["success", "error"]},
        "data": {"type": "array"},
        "total": {"type": "integer"},
        "error": {"type": "string"}
      },
      "required": ["status"]
    },
    "test_examples": [
      {
        "operation": "list_items",
        "limit": 10
      }
    ]
  }
]

Pattern 2: Async Polling (Job-Based APIs)

python
import time

def _submit_job(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
    """Submit job and poll for results."""
    try:
        # Submit
        submit_response = requests.post(
            f"{self.BASE_URL}/jobs/submit",
            json={"data": arguments.get("data")},
            timeout=30
        )
        submit_response.raise_for_status()
        job_id = submit_response.json().get("job_id")
        
        # Poll
        for attempt in range(60):  # 2 min max
            status_response = requests.get(
                f"{self.BASE_URL}/jobs/{job_id}/status",
                timeout=30
            )
            status_response.raise_for_status()
            
            result = status_response.json()
            if result.get("status") == "completed":
                return {
                    "status": "success",
                    "data": result.get("results"),
                    "job_id": job_id
                }
            elif result.get("status") == "failed":
                return {
                    "status": "error",
                    "error": result.get("error"),
                    "job_id": job_id
                }
            
            time.sleep(2)  # Poll every 2s
        
        return {"status": "error", "error": "Timeout after 2 min"}
    except requests.exceptions.RequestException as e:
        return {"status": "error", "error": str(e)}

JSON Best Practices

Tool Naming (≤55 chars for MCP):

  • Template: {API}_{action}_{target}
  • ✅ Good: FDA_get_drug_info (20 chars)
  • ❌ Bad: FDA_get_detailed_drug_information_with_history (55+ chars)

Description (150-250 chars, high-context):

json
{
  "description": "Search database for items. Returns up to 100 results with scores. Supports wildcards (* ?) and Boolean operators (AND, OR, NOT). Example: 'protein AND membrane' finds membrane proteins."
}

Include: What it returns, data source, use case, input format, example

test_examples (MUST be real):

json
{
  "test_examples": [
    {
      "operation": "search",
      "query": "protein",     // ✅ Real, common term
      "limit": 10
    }
  ]
}

❌ Don't use: "id": "XXXXX", "placeholder": "example_123" ✅ Do use: Real IDs from actual API documentation


Testing Strategy

Two-Level Testing (MANDATORY)

Level 1: Direct Class Testing

python
import json
from tooluniverse.your_tool_module import YourToolClass

def test_direct_class():
    """Test implementation logic."""
    with open("src/tooluniverse/data/your_tools.json") as f:
        tools = json.load(f)
        config = next(t for t in tools if t["name"] == "YourTool_operation1")
    
    tool = YourToolClass(config)
    result = tool.run({"operation": "operation1", "param": "value"})
    
    assert result["status"] == "success"
    assert "data" in result

Level 2: ToolUniverse Interface Testing

python
import pytest
from tooluniverse import ToolUniverse

class TestYourTools:
    @pytest.fixture
    def tu(self):
        tu = ToolUniverse()
        tu.load_tools()  # CRITICAL
        return tu
    
    def test_tools_load(self, tu):
        """Verify registration."""
        assert hasattr(tu.tools, 'YourTool_operation1')
    
    def test_execution(self, tu):
        """Test via ToolUniverse (how users call it)."""
        result = tu.tools.YourTool_operation1(**{
            "operation": "operation1",
            "param": "value"
        })
        assert result["status"] == "success"
    
    def test_error_handling(self, tu):
        """Test missing params."""
        result = tu.tools.YourTool_operation1(**{
            "operation": "operation1"
            # Missing required param
        })
        assert result["status"] == "error"

Level 3: Real API Testing

python
def test_real_api():
    """Verify actual API integration."""
    tu = ToolUniverse()
    tu.load_tools()
    
    result = tu.tools.YourTool_operation1(**{
        "operation": "operation1",
        "param": "real_value_from_docs"
    })
    
    if result["status"] == "success":
        assert "data" in result
        print("✅ Real API works")
    else:
        print(f"⚠️  API error (may be down): {result['error']}")

Why Both Levels:

  • Level 1: Tests implementation, catches code bugs
  • Level 2: Tests registration, catches config bugs
  • Level 3: Tests integration, catches API issues

Common Patterns

Error Handling Checklist

✅ Always set timeout (30s recommended) ✅ Catch specific exceptions (Timeout, ConnectionError, HTTPError) ✅ Return error dicts, never raise in run() ✅ Include helpful context in error messages ✅ Handle JSON parsing errors ✅ Validate required parameters

Dependency Management

Check package size FIRST:

bash
curl -s https://pypi.org/pypi/PACKAGE/json | python3 -c "
import json, sys
data = json.load(sys.stdin)
print(f'Dependencies: {len(data[\"info\"][\"requires_dist\"] or [])}')
"

Classification:

  • Core (<100MB, universal use) → [project.dependencies]
  • Optional (>100MB or niche) → [project.optional-dependencies]

In code:

python
try:
    import optional_package
except ImportError:
    return {
        "status": "error",
        "error": "Install with: pip install optional_package"
    }

Pagination Pattern

python
def _list_items(self, arguments):
    params = {}
    if "page" in arguments:
        params["page"] = arguments["page"]
    if "limit" in arguments:
        params["limit"] = arguments["limit"]
    
    response = requests.get(url, params=params, timeout=30)
    data = response.json()
    
    return {
        "status": "success",
        "data": data.get("items", []),
        "page": data.get("page", 0),
        "total_pages": data.get("total_pages", 1),
        "total_items": data.get("total", 0)
    }

Troubleshooting

Tool Doesn't Load (90% of Issues)

Symptoms: Tool count doesn't increase, no error, AttributeError when calling

Cause: Missing Step 2 of registration (default_config.py)

Solution:

python
# Edit src/tooluniverse/default_config.py
TOOLS_CONFIGS = {
    # ... existing ...
    "your_category": os.path.join(current_dir, "data", "your_category_tools.json"),
}

Verify:

bash
grep "your_category" src/tooluniverse/default_config.py
ls src/tooluniverse/tools/YourCategory_*.py
python3 -c "from tooluniverse import ToolUniverse; tu = ToolUniverse(); tu.load_tools(); print(hasattr(tu.tools, 'YourCategory_op1'))"

Tests Fail with Real APIs

Mock vs Real Testing:

  • Mocks test code structure
  • Real calls test API integration
  • Both needed for confidence

What Real Testing Catches:

  • Response structure differences
  • Parameter name mismatches
  • Unexpected pagination
  • Timeout issues
  • Data type surprises

Reference

Complete Workflow

  1. Create Python class with @register_tool
  2. Create JSON config with realistic test_examples
  3. Add to default_config.py ← CRITICAL
  4. Generate wrappers: tu.load_tools()
  5. Test Level 1 (direct class)
  6. Test Level 2 (ToolUniverse interface)
  7. Test Level 3 (real API calls)
  8. Create examples file
  9. Verify all 3 registration steps
  10. Document in verification report

Quick Commands

bash
# Validate JSON
python3 -m json.tool src/tooluniverse/data/your_tools.json

# Check Python syntax
python3 -m py_compile src/tooluniverse/your_tool.py

# Verify registration
grep "your_category" src/tooluniverse/default_config.py

# Generate wrappers
PYTHONPATH=src python3 -m tooluniverse.generate_tools --force

# List wrappers
ls src/tooluniverse/tools/YourCategory_*.py

# Run tests
pytest tests/unit/test_your_tool.py -v

# Count tools
python3 << 'EOF'
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
print(f"Total: {len([t for t in dir(tu.tools) if 'YourCategory' in t])} tools")
EOF

Critical Reminders

⚠️ ALWAYS add to default_config.py (Step 2) ⚠️ NEVER raise exceptions in run() ⚠️ ALWAYS use real test_examples ⚠️ ALWAYS test both levels ⚠️ KEEP tool names ≤55 characters ⚠️ RETURN standard response format ⚠️ SET timeout on all HTTP requests ⚠️ VERIFY all 3 registration steps

Success Criteria

✅ All 3 registration steps verified ✅ Level 1 tests passing (direct class) ✅ Level 2 tests passing (ToolUniverse interface) ✅ Real API calls working (Level 3) ✅ Tool names ≤55 characters ✅ test_examples use real IDs ✅ Standard response format used ✅ Helpful error messages ✅ Examples file created ✅ No raised exceptions in run()

When all criteria met → Production Ready 🎉