Production API Tester
Skill for testing the production research API in live environments. Enables optimization loops: create strategy → test in production → analyze with Langfuse → iterate.
When to Use This Skill
- •"Test the daily_news_briefing strategy in production"
- •"Create a test task and execute it"
- •"Run the full optimization loop for my new legal research strategy"
- •"Check if the production API is healthy"
- •"Clean up test tasks after experimentation"
- •"Monitor execution status and retrieve results"
What This Skill Does
5 Core Functions:
- •Create Test Tasks - Subscribe test emails to research topics
- •Execute Research - Trigger batch or single task execution
- •Monitor Status - Poll for completion and retrieve results
- •Analyze Results - Link to Langfuse traces for performance analysis
- •Cleanup - Delete test tasks after validation
Required Setup
Environment Variables
# Production API Configuration export PROD_API_URL="https://webresearchagent.replit.app" # Production URL export PROD_API_KEY="your_api_key_here" # Your API secret key export CALLBACK_URL="https://webhook.site/your-unique-url" # Webhook receiver URL # Optional: Override for local testing export PROD_API_URL="http://localhost:8000" # Test locally
Getting Your API Key:
- •User will provide this
- •Store in environment or pass via
--api-keyflag
Setting Up Webhook Receiver:
- •Option 1: Use webhook.site (https://webhook.site) - get instant unique URL
- •Option 2: Use Langdock webhook (if using Langdock integration)
- •Option 3: Run local webhook receiver (provided in helpers)
Workflow
Use Case 1: Test Single Strategy (Quick Test)
Goal: Execute a strategy once and check results
Step 1: Create Test Task
cd /home/user/web_research_agent/.claude/skills/production-api-tester/helpers # Create test task python3 create_test_task.py \ --api-key "$PROD_API_KEY" \ --email "test@example.com" \ --topic "AI developments in healthcare" \ --frequency "daily" \ --output /tmp/api_test/task.json
Output:
{
"id": "abc123",
"email": "test@example.com",
"research_topic": "AI developments in healthcare",
"frequency": "daily",
"schedule_time": "09:00",
"is_active": true,
"created_at": "2025-11-09T10:00:00Z",
"last_run_at": null
}
Step 2: Execute Research
# Execute batch (all daily tasks) python3 execute_batch.py \ --api-key "$PROD_API_KEY" \ --frequency "daily" \ --callback-url "$CALLBACK_URL" \ --output /tmp/api_test/execution.json
Output:
{
"status": "running",
"frequency": "daily",
"tasks_found": 1,
"started_at": "2025-11-09T10:01:00Z"
}
Step 3: Monitor Results
# Option A: Poll webhook endpoint python3 check_webhook.py \ --webhook-url "$CALLBACK_URL" \ --wait-for-results \ --timeout 300 # Option B: Check task status python3 get_task.py \ --api-key "$PROD_API_KEY" \ --task-id "abc123" \ --output /tmp/api_test/task_status.json
Step 4: Link to Langfuse
# Get Langfuse trace for this execution python3 link_to_langfuse.py \ --task-id "abc123" \ --email "test@example.com" \ --output /tmp/api_test/langfuse_link.json
Output:
{
"task_id": "abc123",
"trace_query": {
"metadata_filter": {
"user_email": "test@example.com",
"research_topic": "AI developments in healthcare"
},
"time_range": "last_1_hour"
},
"langfuse_url": "https://cloud.langfuse.com/project/.../traces?filter=...",
"trace_count": 1,
"latest_trace_id": "xyz789"
}
Step 5: Cleanup
# Delete test task python3 delete_task.py \ --api-key "$PROD_API_KEY" \ --task-id "abc123"
Use Case 2: Full Optimization Loop (Strategy Development)
Goal: Test new strategy → analyze performance → iterate
This combines strategy-builder skill with production-api-tester skill.
Loop Iteration:
cd /home/user/web_research_agent/.claude/skills/production-api-tester/helpers # 1. Generate/update strategy (using strategy-builder skill) python3 ../../strategy-builder/helpers/generate_strategy.py \ --slug "legal/court_cases_de" \ --category "legal" \ --time-window "month" \ --depth "comprehensive" \ --output /tmp/optimization_loop/strategy_v1.yaml # 2. Validate strategy python3 ../../strategy-builder/helpers/validate_strategy.py \ --strategy /tmp/optimization_loop/strategy_v1.yaml # 3. Deploy strategy to database # (Manual: save to strategies/, add to index.yaml, migrate to DB) # 4. Create test task for this strategy python3 create_test_task.py \ --api-key "$PROD_API_KEY" \ --email "legal-test@example.com" \ --topic "Datenschutz DSGVO Verstoß" \ --frequency "daily" \ --output /tmp/optimization_loop/task.json # 5. Execute research TASK_ID=$(jq -r '.id' /tmp/optimization_loop/task.json) python3 execute_batch.py \ --api-key "$PROD_API_KEY" \ --frequency "daily" \ --callback-url "$CALLBACK_URL" \ --output /tmp/optimization_loop/execution.json # 6. Wait for completion and get results python3 wait_for_completion.py \ --task-id "$TASK_ID" \ --api-key "$PROD_API_KEY" \ --timeout 600 \ --output /tmp/optimization_loop/results.json # 7. Link to Langfuse trace python3 link_to_langfuse.py \ --task-id "$TASK_ID" \ --email "legal-test@example.com" \ --output /tmp/optimization_loop/langfuse.json # 8. Retrieve and analyze Langfuse trace TRACE_ID=$(jq -r '.latest_trace_id' /tmp/optimization_loop/langfuse.json) python3 ../../langfuse-optimization/helpers/retrieve_single_trace.py \ "$TRACE_ID" \ --filter-essential \ --output /tmp/optimization_loop/trace.json # 9. Analyze performance python3 ../../strategy-builder/helpers/analyze_strategy_performance.py \ --traces /tmp/optimization_loop/trace.json \ --strategy "legal/court_cases_de" \ --output /tmp/optimization_loop/performance.json # 10. Review recommendations and iterate cat /tmp/optimization_loop/performance.json # Apply fixes to strategy YAML, repeat from step 1
Use Case 3: Batch Testing (Multiple Strategies)
Goal: Test multiple strategies in parallel
Step 1: Create Multiple Test Tasks
cd /home/user/web_research_agent/.claude/skills/production-api-tester/helpers # Create tasks for different strategies python3 batch_create_tasks.py \ --api-key "$PROD_API_KEY" \ --tasks-file /tmp/batch_test/tasks_config.json \ --output /tmp/batch_test/created_tasks.json
tasks_config.json:
[
{
"email": "test-news@example.com",
"research_topic": "AI regulation updates",
"frequency": "daily",
"strategy_hint": "daily_news_briefing"
},
{
"email": "test-financial@example.com",
"research_topic": "Tesla stock analysis",
"frequency": "daily",
"strategy_hint": "financial_research"
},
{
"email": "test-legal@example.com",
"research_topic": "GDPR compliance updates",
"frequency": "daily",
"strategy_hint": "legal/court_cases_de"
}
]
Step 2: Execute All
# Execute daily batch python3 execute_batch.py \ --api-key "$PROD_API_KEY" \ --frequency "daily" \ --callback-url "$CALLBACK_URL"
Step 3: Monitor All
# Wait for all tasks to complete python3 monitor_batch.py \ --api-key "$PROD_API_KEY" \ --tasks-file /tmp/batch_test/created_tasks.json \ --timeout 900 \ --output /tmp/batch_test/batch_results.json
Step 4: Analyze All
# Generate comparison report python3 compare_strategies.py \ --results /tmp/batch_test/batch_results.json \ --output /tmp/batch_test/comparison.json
Output: Comparison of latency, success rate, error types across strategies
Step 5: Cleanup All
# Delete all test tasks python3 batch_delete_tasks.py \ --api-key "$PROD_API_KEY" \ --tasks-file /tmp/batch_test/created_tasks.json
Use Case 4: Health Check & Monitoring
Goal: Verify production API is healthy
cd /home/user/web_research_agent/.claude/skills/production-api-tester/helpers # Quick health check python3 health_check.py \ --api-url "$PROD_API_URL" # Extended monitoring python3 health_check.py \ --api-url "$PROD_API_URL" \ --continuous \ --interval 60 \ --duration 3600
Output:
✓ API is healthy Status: online Database: connected Langfuse: enabled Response time: 234ms
Helper Tools Reference
1. create_test_task.py
Purpose: Create research task subscription
Usage:
python3 create_test_task.py \ --api-key "$PROD_API_KEY" \ [--api-url "$PROD_API_URL"] \ --email "test@example.com" \ --topic "Research topic" \ --frequency daily|weekly|monthly \ [--schedule-time "09:00"] \ --output /tmp/task.json
Output: Task object with ID for tracking
2. execute_batch.py
Purpose: Trigger batch research execution
Usage:
python3 execute_batch.py \ --api-key "$PROD_API_KEY" \ [--api-url "$PROD_API_URL"] \ --frequency daily|weekly|monthly \ --callback-url "https://webhook.site/..." \ --output /tmp/execution.json
Output: Execution status (running, tasks_found, started_at)
3. get_task.py
Purpose: Retrieve task details
Usage:
python3 get_task.py \ --api-key "$PROD_API_KEY" \ [--api-url "$PROD_API_URL"] \ --task-id "abc123" \ --output /tmp/task.json
4. list_tasks.py
Purpose: List all research tasks
Usage:
python3 list_tasks.py \ --api-key "$PROD_API_KEY" \ [--api-url "$PROD_API_URL"] \ [--email "filter@example.com"] \ [--frequency daily] \ --output /tmp/tasks.json
5. delete_task.py
Purpose: Delete research task
Usage:
python3 delete_task.py \ --api-key "$PROD_API_KEY" \ [--api-url "$PROD_API_URL"] \ --task-id "abc123"
6. wait_for_completion.py
Purpose: Poll task until completion
Usage:
python3 wait_for_completion.py \ --api-key "$PROD_API_KEY" \ --task-id "abc123" \ [--timeout 600] \ [--poll-interval 10] \ --output /tmp/results.json
7. link_to_langfuse.py
Purpose: Find Langfuse trace for task execution
Usage:
python3 link_to_langfuse.py \ --task-id "abc123" \ --email "test@example.com" \ [--time-range "last_1_hour"] \ --output /tmp/langfuse_link.json
Output:
- •Trace query parameters
- •Langfuse dashboard URL
- •Latest trace ID
8. health_check.py
Purpose: Check API health
Usage:
python3 health_check.py \ [--api-url "$PROD_API_URL"] \ [--continuous] \ [--interval 60] \ [--duration 3600]
9. webhook_receiver.py (Local Testing)
Purpose: Run local webhook receiver for testing
Usage:
# Start local webhook receiver python3 webhook_receiver.py \ --port 8080 \ --output-dir /tmp/webhooks # Use as callback URL export CALLBACK_URL="http://localhost:8080/webhook"
Features:
- •Logs all incoming webhooks
- •Saves payloads to disk
- •Provides ngrok-style public URL (if using tunneling)
Integration with Other Skills
With strategy-builder Skill
Optimization Loop:
1. strategy-builder: analyze_research_query.py → Determine if new strategy needed 2. strategy-builder: generate_strategy.py → Create strategy YAML 3. strategy-builder: validate_strategy.py → Validate structure 4. [Manual: Deploy strategy to database] 5. production-api-tester: create_test_task.py → Create test subscription 6. production-api-tester: execute_batch.py → Run research 7. production-api-tester: wait_for_completion.py → Get results 8. production-api-tester: link_to_langfuse.py → Find trace 9. strategy-builder: analyze_strategy_performance.py → Analyze performance 10. [Iterate: Apply fixes and repeat]
With langfuse-optimization Skill
Performance Deep Dive:
1. production-api-tester: execute_batch.py → Generate new traces 2. langfuse-optimization: retrieve_traces_and_observations.py → Get detailed trace data 3. langfuse-optimization: [analyze and fix configs] → Optimize style.yaml, template.yaml, tools.yaml
With langfuse-advanced-filters Skill
Targeted Analysis:
1. production-api-tester: Create multiple test tasks 2. production-api-tester: Execute batch 3. langfuse-advanced-filters: query_with_filters.py → Filter by specific criteria (e.g., latency > 10s) 4. langfuse-advanced-filters: analyze_filtered_results.py → Identify patterns
Common Patterns
Pattern 1: Rapid Iteration Testing
# Loop for quick iterations
for i in {1..5}; do
echo "Iteration $i"
# Modify strategy (manual or automated)
# Test
python3 create_test_task.py ... --output /tmp/test_$i/task.json
TASK_ID=$(jq -r '.id' /tmp/test_$i/task.json)
python3 execute_batch.py ...
python3 wait_for_completion.py --task-id "$TASK_ID" --output /tmp/test_$i/results.json
# Analyze
python3 link_to_langfuse.py --task-id "$TASK_ID" --output /tmp/test_$i/trace.json
# Cleanup
python3 delete_task.py --task-id "$TASK_ID"
echo "Iteration $i complete. Review /tmp/test_$i/"
sleep 5
done
Pattern 2: A/B Testing Strategies
# Test strategy A python3 create_test_task.py --email "test-a@example.com" --topic "AI news" --output /tmp/ab_test/task_a.json # Test strategy B (different strategy slug via topic classification) python3 create_test_task.py --email "test-b@example.com" --topic "AI regulation detailed analysis" --output /tmp/ab_test/task_b.json # Execute both python3 execute_batch.py --frequency daily # Compare results python3 compare_tasks.py \ --task-a /tmp/ab_test/task_a.json \ --task-b /tmp/ab_test/task_b.json \ --output /tmp/ab_test/comparison.json
Pattern 3: Regression Testing
# Before deploying changes to production, test current vs new # 1. Baseline (current production strategy) python3 create_test_task.py --email "baseline@example.com" --topic "Test topic" --output /tmp/regression/baseline_task.json python3 execute_batch.py ... # Save results # 2. Make changes to strategy # 3. Test new version python3 create_test_task.py --email "new@example.com" --topic "Test topic" --output /tmp/regression/new_task.json python3 execute_batch.py ... # Compare results # 4. Validate no regressions python3 validate_regression.py \ --baseline /tmp/regression/baseline_results.json \ --new /tmp/regression/new_results.json \ --output /tmp/regression/regression_report.json
Tips & Best Practices
1. Use Unique Test Emails
Always use identifiable test email addresses:
- •
test-strategy-{strategy_name}@example.com - •
dev-{your_name}@example.com - •Never use real user emails for testing
2. Clean Up Test Tasks
Always delete test tasks after validation:
# List all test tasks python3 list_tasks.py --api-key "$PROD_API_KEY" | grep "test-" # Bulk delete python3 batch_delete_tasks.py --pattern "test-*"
3. Use Webhook.site for Quick Tests
For quick tests without setting up infrastructure:
- •Go to https://webhook.site
- •Copy your unique URL
- •Use as
CALLBACK_URL - •View results in browser
4. Tag Test Executions
When creating test tasks, use descriptive topics:
# Good --topic "[TEST] AI news - strategy_v2_iteration_3" # Bad --topic "test"
This makes Langfuse traces easier to find and filter.
5. Automate Full Loop
Create a script for the full optimization loop:
#!/bin/bash # optimize_strategy.sh STRATEGY_SLUG=$1 TEST_TOPIC=$2 # Generate → Validate → Deploy → Test → Analyze → Report ...
6. Monitor API Rate Limits
Production API may have rate limits:
- •Wait between batch executions
- •Use
--poll-intervalto avoid overwhelming the API - •Check API response headers for rate limit info
Troubleshooting
"Authentication failed":
- •Verify
PROD_API_KEYis set correctly - •Check API key is active in production database
- •Ensure
X-API-Keyheader is sent
"Webhook not receiving results":
- •Verify
CALLBACK_URLis publicly accessible - •Check webhook receiver logs
- •Use webhook.site for debugging
- •Ensure URL doesn't have trailing slash inconsistencies
"Task execution times out":
- •Increase
--timeoutparameter - •Check production logs for errors
- •Verify strategy is valid and doesn't have infinite loops
- •Check Langfuse for ERROR level traces
"Cannot find Langfuse trace":
- •Wait 30-60 seconds for trace to be indexed
- •Verify metadata fields match (email, topic)
- •Check time range is wide enough
- •Use
--time-range "last_1_day"for safety
"Health check fails":
- •Verify
PROD_API_URLis correct - •Check if API is deployed and running
- •Verify network connectivity
- •Check API logs for startup errors
Security Considerations
1. API Key Management
DO:
- •Store API key in environment variables
- •Use separate API keys for testing vs production
- •Rotate keys regularly
DON'T:
- •Commit API keys to git
- •Share API keys in logs or screenshots
- •Use production API key for automated testing
2. Test Data
DO:
- •Use fake/test email addresses
- •Use non-sensitive research topics
- •Mark test tasks clearly
DON'T:
- •Use real user data for testing
- •Test with sensitive/confidential topics
- •Leave test tasks in production database
3. Webhook Security
DO:
- •Use HTTPS for webhook URLs
- •Validate webhook payloads
- •Log webhook failures
DON'T:
- •Expose webhook receiver without authentication
- •Trust webhook data without validation
- •Store sensitive data in webhook logs
Success Criteria
Good production testing should:
- •✅ Use isolated test tasks (identifiable emails)
- •✅ Clean up after completion
- •✅ Link to Langfuse traces for analysis
- •✅ Document results for comparison
- •✅ Enable rapid iteration (< 5 min per cycle)
- •✅ Validate before deploying to real users
Remember: This skill is about safe production testing, not replacing proper staging environments. Use it for:
- •Strategy validation
- •Performance profiling
- •Regression testing
- •Optimization loops
For high-risk changes, always test locally first using run_daily_briefing.py.