Freeplay Dataset Management

IMPORTANT: Safety Guidelines

Confirmation Required: Always ask for user confirmation before performing any write operations (creating datasets, adding test cases, updating datasets, updating test cases).

No Deletion Operations: Do NOT perform any deletion operations (deleting datasets or test cases). If the user requests deletion, inform them that deletion must be done manually through the Freeplay UI or API directly. This skill does not support deletion actions to prevent accidental data loss.

Manage test datasets for evaluating prompts and agents in Freeplay.

Critical: Two Dataset Types

Freeplay has two distinct dataset types with different API endpoints:

•Prompt Datasets - Test individual prompt templates → See prompt-datasets.md
•Agent Datasets - Test complete agent workflows → See agent-datasets.md

ALWAYS confirm which type the user needs. The APIs are not interchangeable.

Quick decision guide:

•User mentions "prompt", "template", or "component" → Prompt Dataset
•User mentions "agent", "workflow", or "end-to-end" → Agent Dataset

Quick Start

Setup (required for all operations)

python

import requests
import os
from scripts.secrets import SecretString

api_key = SecretString(os.environ.get("FREEPLAY_API_KEY"))
headers = {
    "Authorization": f"Bearer {api_key.get()}",
    "Content-Type": "application/json"
}
project_id = "<project-id>"  # See note below on how to resolve this
base = f"{os.environ.get('FREEPLAY_BASE_URL', 'https://app.freeplay.ai')}/api/v2/projects/{project_id}"

All examples assume this setup. Required environment variables:

•FREEPLAY_API_KEY - Your Freeplay API key
•FREEPLAY_BASE_URL - API URL (default: https://app.freeplay.ai)

Project ID can come from:

•User specification
•MCP list_projects() tool to discover available projects

Create a Prompt Dataset

python

response = requests.post(
    f"{base}/prompt-datasets",
    headers=headers,
    json={
        "name": "my-golden-set",
        "description": "Golden test cases",
        "input_names": ["question", "context"]
    }
)
dataset_id = response.json()["id"]  # Status: 201 Created

Create an Agent Dataset

python

response = requests.post(
    f"{base}/agent-datasets",
    headers=headers,
    json={
        "name": "my-agent-tests",
        "description": "Agent workflow test cases"
    }
)
dataset_id = response.json()["id"]  # Status: 201 Created

Add Test Cases (both types)

python

# For prompt datasets: /prompt-datasets/{id}/test-cases/bulk
# For agent datasets: /agent-datasets/{id}/test-cases/bulk

test_cases = [
    {
        "inputs": {"question": "What is your refund policy?"},
        "output": "Expected response...",
        "metadata": {"category": "refunds"}
    }
]

response = requests.post(
    f"{base}/prompt-datasets/{dataset_id}/test-cases/bulk",
    headers=headers,
    json={"data": test_cases}  # Key is "data", not "test_cases"
)
# Status: 201 Created, max 100 items per request

Common Operations

Import from CSV/JSONL

Use the utility script for bulk imports:

bash

python scripts/import_testcases.py \
  --file test_cases.csv \
  --dataset-id ds_abc123 \
  --type prompt  # or "agent"

Handles batching automatically (100 per request). See script help for full options:

bash

python scripts/import_testcases.py --help

CSV format:

•Columns starting with inputs. become input fields
•output column becomes expected output
•Other columns become metadata

Example:

csv

inputs.question,inputs.context,output,category,priority
"What is...","User context","Expected...","refunds","high"

List Datasets

python

# Prompt datasets
response = requests.get(f"{base}/prompt-datasets", headers=headers)
datasets = response.json().get('data', [])

# Agent datasets
response = requests.get(f"{base}/agent-datasets", headers=headers)
datasets = response.json().get('data', [])

Get Test Cases

python

# Replace prompt-datasets with agent-datasets for agent datasets
response = requests.get(
    f"{base}/prompt-datasets/{dataset_id}/test-cases",
    headers=headers
)
test_cases = response.json().get('data', [])

Update Test Case

python

response = requests.patch(
    f"{base}/prompt-datasets/{dataset_id}/test-cases/{test_case_id}",
    headers=headers,
    json={
        "inputs": {"question": "Updated question"},
        "output": "Updated output",
        "metadata": {"updated": True}
    }
)
# Status: 200 OK

Workflow Patterns

For complex operations, use the checklist pattern:

code

Dataset Creation with Verification:
- [ ] Step 1: Create dataset (verify 201 status)
- [ ] Step 2: Add test cases (verify 201 status)
- [ ] Step 3: Retrieve test cases to confirm count
- [ ] Step 4: Verify test case structure

See examples.md for complete workflows with verification steps.

Reference Documentation

Complete API references:

•prompt-datasets.md - All prompt dataset operations
•agent-datasets.md - All agent dataset operations
•examples.md - Complete workflow examples with verification

Quick links by task:

•Creating datasets: Prompt | Agent
•Adding test cases: Prompt | Agent
•Test case structure: Prompt | Agent
•Updating operations: Prompt | Agent

Utility Scripts

scripts/import_testcases.py - Import test cases from CSV or JSONL files

•Handles batching automatically (100 per request)
•Supports both prompt and agent datasets
•Provides progress reporting and error handling

scripts/batch_operations.py - Reusable batch operation functions

•Use in custom scripts for programmatic batch operations
•Functions: batch_create_test_cases()

Common Errors

Error	Cause	Solution
404 Path Not Found	Wrong endpoint path or dataset type	Verify `/api/v2/` base and correct type (`prompt-datasets` or `agent-datasets`)
Bulk limit exceeded	>100 items in single request	Use `scripts/import_testcases.py` or batch manually
Unauthorized	Invalid or missing API key	Check `FREEPLAY_API_KEY` environment variable
Incompatible test case	Input keys don't match prompt variables	Verify `inputs` keys match prompt template variables

Best Practices

•Verify operations: Always check status codes and counts after operations
•Use descriptive names: Dataset names should indicate purpose (e.g., "refunds-edge-cases")
•Include metadata: Tag test cases for filtering (category, priority, source)
•Provide expected outputs: Required for meaningful evaluations
•Batch large operations: Use scripts for >10 items
•Organize by scenario: Create focused datasets rather than one large dataset
•Match input names: For prompt datasets, ensure inputs match template variables

Related Skills

After creating datasets:

•Run tests using the run-test skill
•Analyze results using the test-run-analysis skill
•Check deployment status using the get_deployed_prompt_versions MCP tool

dataset-management

Freeplay Dataset Management

IMPORTANT: Safety Guidelines

Critical: Two Dataset Types

Quick Start

Setup (required for all operations)

Create a Prompt Dataset

Create an Agent Dataset

Add Test Cases (both types)

Common Operations

Import from CSV/JSONL

List Datasets

Get Test Cases

Update Test Case

Workflow Patterns

Reference Documentation

Utility Scripts

Common Errors

Best Practices

Related Skills

Freeplay Documentation