Run Freeplay Test

IMPORTANT: Confirmation Required

Always ask for user confirmation before executing any test run. Test runs make API calls to Freeplay and LLM providers, which may incur costs. Never execute a test without explicit user consent.

This skill helps users execute Freeplay test runs via the SDK/API to evaluate their AI features against datasets.

Note: This skill is for SDK/API-based test runs only, not UI-based testing from the Freeplay dashboard.

When to use this skill

•"Run a test for my prompt"
•"Execute the test run for customer support"
•"Test my agent against the golden dataset"
•"Run evaluation tests"
•"Kick off a test run"
•"I want to test my prompt changes"

Workflow

Step 1: Identify the Test Run Code Path

Search the codebase for existing test run implementation. Look for:

bash

# Search patterns (run these to find test run implementations)
grep -r "test_runs.create" --include="*.py" .
grep -r "fp_client.test_runs" --include="*.py" .
grep -r "freeplay.*test" --include="*.py" .
grep -r "TestRun" --include="*.py" .

The key API call to look for is:

Python SDK:

python

test_run = fp_client.test_runs.create(
    project_id=project_id,
    testlist="Dataset Name",
    name="Test Run Name"
)

TypeScript SDK:

typescript

const testRun = await fpClient.testRuns.create({
    projectId: projectId,
    testlist: "Dataset Name",
    name: "Test Run Name"
});

Step 2: If No Test Run Code Exists

If the search returns no results, inform the user:

I couldn't find an existing test run script in your codebase. To run Freeplay test runs, you'll need to set up a test runner script first.

Documentation to get started:

•Test Runs Overview

•Component-Level Testing - For testing individual prompts

•End-to-End Testing - For testing complete workflows/agents

Would you like me to help you understand what's needed to set up a test runner?

Do NOT attempt to write the test runner for them. This requires understanding their specific:

•Pipeline architecture
•Dataset structure
•Evaluation criteria
•Environment configuration

Step 3: If Test Run Code Exists

Once you've found the test run implementation:

•Identify the entry point - Find the script/function that initiates test runs
•
Check for required environment variables:
- •FREEPLAY_API_KEY
- •FREEPLAY_BASE_URL (default: https://app.freeplay.ai)
- •OPENAI_API_KEY (or other LLM provider keys)

NOTE Project ID can be provided by user or discovered via list_projects()

•

Determine how to run it:

bash

# Common patterns
python run_tests.py
python -m tests.run_freeplay_tests
npm run test:freeplay
pytest tests/test_freeplay.py

•
Ask the user for any required parameters:
- •Dataset/testlist name
- •Test run name
- •Prompt template to test
- •Environment (if applicable)

Step 4: Execute with Consent

Before running, confirm with the user:

I found the test runner at [path]. This will:

•Run tests against the [dataset] dataset

•Test the [prompt/agent]

•Make API calls to Freeplay and your LLM provider

Ready to execute?

Only proceed with explicit consent.

Step 5: Handle Results

On Success:

•Report the test run completed
•Provide the test run ID for reviewing results
•Share any immediate metrics if available in output
•Always suggest using the test-run-analysis skill to analyze results and suggest improvements to the prompt or agent based on evaluation metrics

On Failure:

•Capture the error output
•
Identify common issues:
- •Missing environment variables
- •Invalid API keys
- •Dataset not found
- •Network/connectivity issues
- •Rate limiting
•Help debug with the user

Example Test Run Script Structure

For reference, a typical component-level test run script looks like:

python

from freeplay import Freeplay, RecordPayload
from openai import OpenAI
import os
from scripts.secrets import SecretString

# Initialize clients (use SecretString to prevent accidental logging)
api_key = SecretString(os.environ.get("FREEPLAY_API_KEY"))
fp_client = Freeplay(
    api_key=api_key.get(),
    api_base=os.environ.get("FREEPLAY_BASE_URL", "https://app.freeplay.ai")
)
openai_client = OpenAI()

project_id = "<project-id>"  # Provided by user or discovered via list_projects()

# Create test run
test_run = fp_client.test_runs.create(
    project_id=project_id,
    testlist="Golden Set",
    name="My Test Run"
)

# Get prompt template
template_prompt = fp_client.prompts.get(
    project_id=project_id,
    template_name="my-prompt",
    environment="latest"
)

# Process each test case
for test_case in test_run.test_cases:
    formatted_prompt = template_prompt.bind(test_case.variables).format()

    # Call LLM
    response = openai_client.chat.completions.create(
        model=formatted_prompt.prompt_info.model,
        messages=formatted_prompt.llm_prompt,
        **formatted_prompt.prompt_info.model_parameters
    )

    # Record results back to Freeplay
    fp_client.recordings.create(RecordPayload(
        # ... recording configuration
    ))

print(f"Test run completed: {test_run.id}")

Environment Variables

Required variables that must be set:

•FREEPLAY_API_KEY - Freeplay API key
•FREEPLAY_BASE_URL - Freeplay API URL (default: https://app.freeplay.ai)
•LLM provider keys (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY)

Project ID can come from:

•User specification
•MCP list_projects() tool to discover available projects

Tips

•Always search the codebase first before assuming no test runner exists
•Check common locations: scripts/, tests/, tools/, root directory
•Look for files named: run_test*.py, test_runner.py, freeplay_test*.py
•Check package.json scripts or pyproject.toml for test commands
•Environment variables may be in .env, .env.local, or CI configuration