Freeplay Dataset Management
IMPORTANT: Safety Guidelines
Confirmation Required: Always ask for user confirmation before performing any write operations (creating datasets, adding test cases, updating datasets, updating test cases).
No Deletion Operations: Do NOT perform any deletion operations (deleting datasets or test cases). If the user requests deletion, inform them that deletion must be done manually through the Freeplay UI or API directly. This skill does not support deletion actions to prevent accidental data loss.
Manage test datasets for evaluating prompts and agents in Freeplay.
Critical: Two Dataset Types
Freeplay has two distinct dataset types with different API endpoints:
- •Prompt Datasets - Test individual prompt templates → See prompt-datasets.md
- •Agent Datasets - Test complete agent workflows → See agent-datasets.md
ALWAYS confirm which type the user needs. The APIs are not interchangeable.
Quick decision guide:
- •User mentions "prompt", "template", or "component" → Prompt Dataset
- •User mentions "agent", "workflow", or "end-to-end" → Agent Dataset
Quick Start
Setup (required for all operations)
import requests
import os
from scripts.secrets import SecretString
api_key = SecretString(os.environ.get("FREEPLAY_API_KEY"))
headers = {
"Authorization": f"Bearer {api_key.get()}",
"Content-Type": "application/json"
}
project_id = "<project-id>" # See note below on how to resolve this
base = f"{os.environ.get('FREEPLAY_BASE_URL', 'https://app.freeplay.ai')}/api/v2/projects/{project_id}"
All examples assume this setup. Required environment variables:
- •
FREEPLAY_API_KEY- Your Freeplay API key - •
FREEPLAY_BASE_URL- API URL (default: https://app.freeplay.ai)
Project ID can come from:
- •User specification
- •MCP
list_projects()tool to discover available projects
Create a Prompt Dataset
response = requests.post(
f"{base}/prompt-datasets",
headers=headers,
json={
"name": "my-golden-set",
"description": "Golden test cases",
"input_names": ["question", "context"]
}
)
dataset_id = response.json()["id"] # Status: 201 Created
Create an Agent Dataset
response = requests.post(
f"{base}/agent-datasets",
headers=headers,
json={
"name": "my-agent-tests",
"description": "Agent workflow test cases"
}
)
dataset_id = response.json()["id"] # Status: 201 Created
Add Test Cases (both types)
# For prompt datasets: /prompt-datasets/{id}/test-cases/bulk
# For agent datasets: /agent-datasets/{id}/test-cases/bulk
test_cases = [
{
"inputs": {"question": "What is your refund policy?"},
"output": "Expected response...",
"metadata": {"category": "refunds"}
}
]
response = requests.post(
f"{base}/prompt-datasets/{dataset_id}/test-cases/bulk",
headers=headers,
json={"data": test_cases} # Key is "data", not "test_cases"
)
# Status: 201 Created, max 100 items per request
Common Operations
Import from CSV/JSONL
Use the utility script for bulk imports:
python scripts/import_testcases.py \ --file test_cases.csv \ --dataset-id ds_abc123 \ --type prompt # or "agent"
Handles batching automatically (100 per request). See script help for full options:
python scripts/import_testcases.py --help
CSV format:
- •Columns starting with
inputs.become input fields - •
outputcolumn becomes expected output - •Other columns become metadata
Example:
inputs.question,inputs.context,output,category,priority "What is...","User context","Expected...","refunds","high"
List Datasets
# Prompt datasets
response = requests.get(f"{base}/prompt-datasets", headers=headers)
datasets = response.json().get('data', [])
# Agent datasets
response = requests.get(f"{base}/agent-datasets", headers=headers)
datasets = response.json().get('data', [])
Get Test Cases
# Replace prompt-datasets with agent-datasets for agent datasets
response = requests.get(
f"{base}/prompt-datasets/{dataset_id}/test-cases",
headers=headers
)
test_cases = response.json().get('data', [])
Update Test Case
response = requests.patch(
f"{base}/prompt-datasets/{dataset_id}/test-cases/{test_case_id}",
headers=headers,
json={
"inputs": {"question": "Updated question"},
"output": "Updated output",
"metadata": {"updated": True}
}
)
# Status: 200 OK
Workflow Patterns
For complex operations, use the checklist pattern:
Dataset Creation with Verification: - [ ] Step 1: Create dataset (verify 201 status) - [ ] Step 2: Add test cases (verify 201 status) - [ ] Step 3: Retrieve test cases to confirm count - [ ] Step 4: Verify test case structure
See examples.md for complete workflows with verification steps.
Reference Documentation
Complete API references:
- •prompt-datasets.md - All prompt dataset operations
- •agent-datasets.md - All agent dataset operations
- •examples.md - Complete workflow examples with verification
Quick links by task:
- •Creating datasets: Prompt | Agent
- •Adding test cases: Prompt | Agent
- •Test case structure: Prompt | Agent
- •Updating operations: Prompt | Agent
Utility Scripts
scripts/import_testcases.py - Import test cases from CSV or JSONL files
- •Handles batching automatically (100 per request)
- •Supports both prompt and agent datasets
- •Provides progress reporting and error handling
scripts/batch_operations.py - Reusable batch operation functions
- •Use in custom scripts for programmatic batch operations
- •Functions:
batch_create_test_cases()
Common Errors
| Error | Cause | Solution |
|---|---|---|
| 404 Path Not Found | Wrong endpoint path or dataset type | Verify /api/v2/ base and correct type (prompt-datasets or agent-datasets) |
| Bulk limit exceeded | >100 items in single request | Use scripts/import_testcases.py or batch manually |
| Unauthorized | Invalid or missing API key | Check FREEPLAY_API_KEY environment variable |
| Incompatible test case | Input keys don't match prompt variables | Verify inputs keys match prompt template variables |
Best Practices
- •Verify operations: Always check status codes and counts after operations
- •Use descriptive names: Dataset names should indicate purpose (e.g., "refunds-edge-cases")
- •Include metadata: Tag test cases for filtering (category, priority, source)
- •Provide expected outputs: Required for meaningful evaluations
- •Batch large operations: Use scripts for >10 items
- •Organize by scenario: Create focused datasets rather than one large dataset
- •Match input names: For prompt datasets, ensure inputs match template variables
Related Skills
After creating datasets:
- •Run tests using the
run-testskill - •Analyze results using the
test-run-analysisskill - •Check deployment status using the
get_deployed_prompt_versionsMCP tool