Run Freeplay Test
IMPORTANT: Confirmation Required
Always ask for user confirmation before executing any test run. Test runs make API calls to Freeplay and LLM providers, which may incur costs. Never execute a test without explicit user consent.
This skill helps users execute Freeplay test runs via the SDK/API to evaluate their AI features against datasets.
Note: This skill is for SDK/API-based test runs only, not UI-based testing from the Freeplay dashboard.
When to use this skill
- •"Run a test for my prompt"
- •"Execute the test run for customer support"
- •"Test my agent against the golden dataset"
- •"Run evaluation tests"
- •"Kick off a test run"
- •"I want to test my prompt changes"
Workflow
Step 1: Identify the Test Run Code Path
Search the codebase for existing test run implementation. Look for:
# Search patterns (run these to find test run implementations) grep -r "test_runs.create" --include="*.py" . grep -r "fp_client.test_runs" --include="*.py" . grep -r "freeplay.*test" --include="*.py" . grep -r "TestRun" --include="*.py" .
The key API call to look for is:
Python SDK:
test_run = fp_client.test_runs.create(
project_id=project_id,
testlist="Dataset Name",
name="Test Run Name"
)
TypeScript SDK:
const testRun = await fpClient.testRuns.create({
projectId: projectId,
testlist: "Dataset Name",
name: "Test Run Name"
});
Step 2: If No Test Run Code Exists
If the search returns no results, inform the user:
I couldn't find an existing test run script in your codebase. To run Freeplay test runs, you'll need to set up a test runner script first.
Documentation to get started:
- •Test Runs Overview
- •Component-Level Testing - For testing individual prompts
- •End-to-End Testing - For testing complete workflows/agents
Would you like me to help you understand what's needed to set up a test runner?
Do NOT attempt to write the test runner for them. This requires understanding their specific:
- •Pipeline architecture
- •Dataset structure
- •Evaluation criteria
- •Environment configuration
Step 3: If Test Run Code Exists
Once you've found the test run implementation:
- •Identify the entry point - Find the script/function that initiates test runs
- •Check for required environment variables:
- •
FREEPLAY_API_KEY - •
FREEPLAY_BASE_URL(default: https://app.freeplay.ai) - •
OPENAI_API_KEY(or other LLM provider keys)
- •
NOTE
Project ID can be provided by user or discovered via list_projects()
- •
Determine how to run it:
bash# Common patterns python run_tests.py python -m tests.run_freeplay_tests npm run test:freeplay pytest tests/test_freeplay.py
- •
Ask the user for any required parameters:
- •Dataset/testlist name
- •Test run name
- •Prompt template to test
- •Environment (if applicable)
Step 4: Execute with Consent
Before running, confirm with the user:
I found the test runner at
[path]. This will:
- •Run tests against the
[dataset]dataset- •Test the
[prompt/agent]- •Make API calls to Freeplay and your LLM provider
Ready to execute?
Only proceed with explicit consent.
Step 5: Handle Results
On Success:
- •Report the test run completed
- •Provide the test run ID for reviewing results
- •Share any immediate metrics if available in output
- •Always suggest using the
test-run-analysisskill to analyze results and suggest improvements to the prompt or agent based on evaluation metrics
On Failure:
- •Capture the error output
- •Identify common issues:
- •Missing environment variables
- •Invalid API keys
- •Dataset not found
- •Network/connectivity issues
- •Rate limiting
- •Help debug with the user
Example Test Run Script Structure
For reference, a typical component-level test run script looks like:
from freeplay import Freeplay, RecordPayload
from openai import OpenAI
import os
from scripts.secrets import SecretString
# Initialize clients (use SecretString to prevent accidental logging)
api_key = SecretString(os.environ.get("FREEPLAY_API_KEY"))
fp_client = Freeplay(
api_key=api_key.get(),
api_base=os.environ.get("FREEPLAY_BASE_URL", "https://app.freeplay.ai")
)
openai_client = OpenAI()
project_id = "<project-id>" # Provided by user or discovered via list_projects()
# Create test run
test_run = fp_client.test_runs.create(
project_id=project_id,
testlist="Golden Set",
name="My Test Run"
)
# Get prompt template
template_prompt = fp_client.prompts.get(
project_id=project_id,
template_name="my-prompt",
environment="latest"
)
# Process each test case
for test_case in test_run.test_cases:
formatted_prompt = template_prompt.bind(test_case.variables).format()
# Call LLM
response = openai_client.chat.completions.create(
model=formatted_prompt.prompt_info.model,
messages=formatted_prompt.llm_prompt,
**formatted_prompt.prompt_info.model_parameters
)
# Record results back to Freeplay
fp_client.recordings.create(RecordPayload(
# ... recording configuration
))
print(f"Test run completed: {test_run.id}")
Environment Variables
Required variables that must be set:
- •
FREEPLAY_API_KEY- Freeplay API key - •
FREEPLAY_BASE_URL- Freeplay API URL (default: https://app.freeplay.ai) - •LLM provider keys (e.g.,
OPENAI_API_KEY,ANTHROPIC_API_KEY)
Project ID can come from:
- •User specification
- •MCP
list_projects()tool to discover available projects
Tips
- •Always search the codebase first before assuming no test runner exists
- •Check common locations:
scripts/,tests/,tools/, root directory - •Look for files named:
run_test*.py,test_runner.py,freeplay_test*.py - •Check
package.jsonscripts orpyproject.tomlfor test commands - •Environment variables may be in
.env,.env.local, or CI configuration