Using Inspect AI
This skill provides guidance and reference material for working with Inspect AI, an open-source framework for LLM evaluations.
Available References
You have access to the following local documentation (read on-demand as needed):
- •Inspect AI source code:
./inspect-ai-repo/- Full cloned repository - •Official documentation:
./official-docs/- Downloaded documentation files
Use these references when you need to:
- •Understand specific APIs or function signatures
- •See implementation patterns and examples
- •Debug issues or understand internal behavior
- •Find the correct way to implement tasks, solvers, scorers, etc.
Quick Reference
Core Concepts
- •Tasks: Define what to evaluate (dataset + solver + scorer)
- •Solvers: Define how the model approaches the task (chain of operations)
- •Scorers: Define how to evaluate the model's output
- •Datasets: Input samples to evaluate on
- •Tools: Functions the model can call during evaluation
Common Patterns
python
from inspect_ai import Task, task
from inspect_ai.dataset import json_dataset
from inspect_ai.scorer import model_graded_fact
from inspect_ai.solver import generate, system_message
@task
def my_eval():
return Task(
dataset=json_dataset("data.json"),
solver=[
system_message("You are a helpful assistant."),
generate(),
],
scorer=model_graded_fact(),
)
Running Evaluations
bash
# Run a task inspect eval my_task.py # Run with specific model inspect eval my_task.py --model openai/gpt-4 # View results inspect view
When Exploring the Codebase
When you need deeper understanding:
- •For API questions: Check
./docs/inspect-ai-repo/src/inspect_ai/ - •For examples: Check
./docs/inspect-ai-repo/examples/ - •For official docs: Check
./docs/official-docs/
Prefer reading the source code directly when documentation is unclear or when you need to understand exact behavior.