Phoenix Playwright Test Writing
Write end-to-end tests for Phoenix using Playwright. Tests live in app/tests/ and follow established patterns.
Timeout Policy
- •Do not pass timeout args in test code under
app/tests. - •Tune timing centrally in
app/playwright.config.ts(globaltimeout,expect.timeout,use.navigationTimeout, andwebServer.timeout).
Quick Start
import { expect, test } from "@playwright/test";
import { randomUUID } from "crypto";
test.describe("Feature Name", () => {
test.beforeEach(async ({ page }) => {
await page.goto(`/login`);
await page.getByLabel("Email").fill("admin@localhost");
await page.getByLabel("Password").fill("admin123");
await page.getByRole("button", { name: "Log In", exact: true }).click();
await page.waitForURL("**/projects");
});
test("can do something", async ({ page }) => {
// Test implementation
});
});
Test Credentials
| User | Password | Role | |
|---|---|---|---|
| Admin | admin@localhost | admin123 | admin |
| Member | member@localhost.com | member123 | member |
| Viewer | viewer@localhost.com | viewer123 | viewer |
Selector Patterns (Priority Order)
- •
Role selectors (most robust):
typescriptpage.getByRole("button", { name: "Save" }); page.getByRole("link", { name: "Datasets" }); page.getByRole("tab", { name: /Evaluators/i }); page.getByRole("menuitem", { name: "Edit" }); page.getByRole("cell", { name: "my-item" }); page.getByRole("heading", { name: "Title" }); page.getByRole("dialog"); page.getByRole("textbox", { name: "Name" }); page.getByRole("combobox", { name: /mapping/i }); - •
Label selectors:
typescriptpage.getByLabel("Email"); page.getByLabel("Dataset Name"); page.getByLabel("Description"); - •
Text selectors:
typescriptpage.getByText("No evaluators added"); page.getByPlaceholder("Search..."); - •
Test IDs (when available):
typescriptpage.getByTestId("modal"); - •
CSS locators (last resort):
typescriptpage.locator('button:has-text("Save")');
Common UI Patterns
Dropdown Menus
// Click button to open dropdown
await page.getByRole("button", { name: "New Dataset" }).click();
// Select menu item
await page.getByRole("menuitem", { name: "New Dataset" }).click();
Nested Menus (Submenus)
// Open menu, hover over submenu trigger, click submenu item
await page.getByRole("button", { name: "Add evaluator" }).click();
await page
.getByRole("menuitem", { name: "Use LLM evaluator template" })
.hover();
await page.getByRole("menuitem", { name: /correctness/i }).click();
// IMPORTANT: Always use getByRole("menuitem") for submenu items, not getByText()
// Playwright's auto-waiting handles the submenu appearance timing
// ❌ BAD - flaky in CI:
// await page.getByText("ExactMatch").first().click();
// ✅ GOOD - reliable:
// await page.getByRole("menuitem", { name: /ExactMatch/i }).click();
Dialogs/Modals
// Wait for dialog
await expect(page.getByRole("dialog")).toBeVisible();
// Fill form in dialog
await page.getByLabel("Name").fill("test-name");
// Submit
await page.getByRole("button", { name: "Create" }).click();
// Wait for close
await expect(page.getByRole("dialog")).not.toBeVisible();
Tables with Row Actions
// Find row by cell content
const row = page.getByRole("row").filter({
has: page.getByRole("cell", { name: "item-name" }),
});
// Click action button in row (usually last button)
await row.getByRole("button").last().click();
// Select action from menu
await page.getByRole("menuitem", { name: "Edit" }).click();
Tabs
await page.getByRole("tab", { name: /Evaluators/i }).click();
await page.waitForURL("**/evaluators");
await expect(page.getByRole("tab", { name: /Evaluators/i })).toHaveAttribute(
"aria-selected",
"true",
);
Form Inputs in Sections
// When multiple textboxes exist, scope to section
const systemSection = page.locator('button:has-text("System")');
const systemTextbox = systemSection
.locator("..")
.locator("..")
.getByRole("textbox");
await systemTextbox.fill("content");
Serial Tests (Shared State)
Use test.describe.serial when tests depend on each other:
test.describe.serial("Workflow", () => {
const itemName = `item-${randomUUID()}`;
test("step 1: create item", async ({ page }) => {
// Creates itemName
});
test("step 2: edit item", async ({ page }) => {
// Uses itemName from previous test
});
test("step 3: verify edits", async ({ page }) => {
// Verifies itemName was edited
});
});
Assertions
// Visibility
await expect(element).toBeVisible();
await expect(element).not.toBeVisible();
// Text content
await expect(element).toHaveText("expected");
await expect(element).toContainText("partial");
// Attributes
await expect(element).toHaveAttribute("aria-selected", "true");
// Input values
await expect(input).toHaveValue("expected value");
// URL
await page.waitForURL("**/datasets/**/examples");
Navigation Patterns
// Direct navigation
await page.goto("/datasets");
await page.waitForURL("**/datasets");
// Click navigation
await page.getByRole("link", { name: "Datasets" }).click();
await page.waitForURL("**/datasets");
// Extract ID from URL
const url = page.url();
const match = url.match(/datasets\/([^/]+)/);
const datasetId = match ? match[1] : "";
// Navigate with query params
await page.goto(`/playground?datasetId=${datasetId}`);
Running Tests
Before running Playwright tests, build the app so E2E runs against the latest frontend changes:
pnpm run build
# Run specific test file pnpm exec playwright test tests/server-evaluators.spec.ts --project=chromium # Run with UI mode pnpm exec playwright test --ui # Run specific test by name pnpm exec playwright test -g "can create" # Debug mode pnpm exec playwright test --debug
Avoiding Interactive Report Server
By default, Playwright serves an HTML report after tests finish and waits for Ctrl+C, which can cause command timeouts. Use these options to avoid this:
# Use list reporter (no interactive server) pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=list # Use dot reporter for minimal output pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=dot # Set CI mode to disable interactive features CI=1 pnpm exec playwright test tests/example.spec.ts --project=chromium
Recommended for automation: Always use --reporter=list or CI=1 when running tests programmatically to ensure the command exits cleanly after tests complete.
Phoenix-Specific Pages
| Page | URL Pattern | Key Elements |
|---|---|---|
| Datasets | /datasets | Table, "New Dataset" button |
| Dataset Detail | /datasets/{id}/examples | Tabs (Experiments, Examples, Evaluators, Versions) |
| Dataset Evaluators | /datasets/{id}/evaluators | "Add evaluator" button, evaluators table |
| Playground | /playground | Prompts section, Experiment section |
| Playground + Dataset | /playground?datasetId={id} | Dataset selector, Evaluators button |
| Prompts | /prompts | "New Prompt" button, prompts table |
| Settings | /settings/general | "Add User" button, users table |
UI Exploration with agent-browser
When selectors are unclear, use agent-browser to explore the Phoenix UI. For detailed agent-browser usage, invoke the /agent-browser skill.
Quick Reference for Phoenix
# Open Phoenix page (dev server runs on port 6006) agent-browser open "http://localhost:6006/datasets" # Get interactive snapshot with element refs agent-browser snapshot -i # Click using refs from snapshot agent-browser click @e5 # Fill form fields agent-browser fill @e2 "test value" # Get element text agent-browser get text @e1
Discovering Selectors Workflow
- •Open the page:
agent-browser open "http://localhost:6006/datasets" - •Get snapshot:
agent-browser snapshot -i - •Find element refs in output (e.g.,
@e1 [button] "New Dataset") - •Interact:
agent-browser click @e1 - •Re-snapshot after navigation/DOM changes:
agent-browser snapshot -i
Translating to Playwright
| agent-browser output | Playwright selector |
|---|---|
@e1 [button] "Save" | page.getByRole("button", { name: "Save" }) |
@e2 [link] "Datasets" | page.getByRole("link", { name: "Datasets" }) |
@e3 [textbox] "Name" | page.getByRole("textbox", { name: "Name" }) |
@e4 [menuitem] "Edit" | page.getByRole("menuitem", { name: "Edit" }) |
@e5 [tab] "Evaluators 0" | page.getByRole("tab", { name: /Evaluators/i }) |
File Naming
- •Feature tests:
{feature-name}.spec.ts - •Access control:
{role}-access.spec.ts - •Rate limiting:
{feature}.rate-limit.spec.ts(runs last)
Common Gotchas
- •Dialog not closing: Wait for a deterministic post-action signal (e.g., dialog hidden + success row visible)
- •Multiple elements: Use
.first(),.last(), or.nth(n) - •Dynamic content: Use regex in name:
{ name: /pattern/i } - •Flaky waits: Prefer
waitForURLoverwaitForTimeout - •Menu not appearing: Wait for specific menu state/element visibility
Debugging Flaky Tests
Critical Lessons Learned
- •
Don't assume parallelism is the problem
- •Phoenix tests run with 7 parallel workers without issues
- •The app handles concurrent logins, database operations, and session management properly
- •If tests fail with parallelism, it's usually a test timing issue, not infrastructure
- •Playwright's browser context isolation is robust - each worker gets isolated cookies/sessions
- •
waitForTimeout is almost always wrong
- •
page.waitForTimeout()is the #1 cause of flakiness in Phoenix tests - •Arbitrary timeouts race against rendering and network speed
- •Always replace with state-based waits:
typescript
// ❌ BAD - flaky, races against rendering await page.waitForTimeout(500); await element.click(); // ✅ GOOD - waits for actual state await element.waitFor({ state: "visible" }); await element.click();
- •
- •
Test the actual failure before fixing
- •Run tests with parallelism enabled to see what actually fails
- •Check error messages - they often point to the real issue
- •Don't optimize prematurely (e.g., caching auth state) if it's not the problem
- •
Phoenix test infrastructure is solid
- •In-memory SQLite works fine with parallel tests
- •No need for per-worker databases
- •No need for auth state caching
- •Tests use
randomUUID()for data isolation - this works well
Debugging Workflow
When tests are flaky:
- •
Run with parallelism multiple times to catch intermittent failures:
bashfor i in 1 2 3 4 5; do pnpm exec playwright test --project=chromium --reporter=dot done
- •
Look for
waitForTimeoutusage - replace with proper waits:bashgrep -r "waitForTimeout" app/tests/
- •
Check for race conditions in element interactions:
- •Wait for element visibility before interacting
- •Wait for network idle when needed:
page.waitForLoadState("networkidle") - •Use
waitForURLafter navigation actions
- •
Verify selectors are stable:
- •Avoid CSS selectors that depend on DOM structure
- •Use role/label selectors that match ARIA attributes
- •Test selectors don't break when UI updates
- •
Run with trace on failure to see what happened:
bashpnpm exec playwright test --trace on-first-retry
Common Flaky Patterns and Fixes
| Flaky Pattern | Root Cause | Fix |
|---|---|---|
| Submenu item not found | Using getByText() instead of getByRole() | Use getByRole("menuitem", { name: /pattern/i }) for submenu items |
| Menu click fails | Menu not fully rendered | await menu.waitFor({ state: "visible" }) before click |
| Dialog assertion fails | Dialog animation not complete | Assert specific completion signal (hidden dialog + next-state element) |
| Navigation timeout | Page still loading | Remove waitForLoadState("networkidle") - it's flaky in CI |
| Element not found | Dynamic content loading | Wait for element visibility, not arbitrary timeout |
| Stale element | Re-render between locate and click | Store locator, not element handle |
Test Stability Best Practices
- •
Use proper waits:
typescript// Wait for element state await element.waitFor({ state: "visible" | "hidden" | "attached" }) // Wait for network await page.waitForLoadState("networkidle" | "domcontentloaded" | "load") // Wait for URL change await page.waitForURL("**/expected-path") - •
Use unique test data:
typescriptconst uniqueName = `test-${randomUUID()}`; - •
Prefer role selectors - they're less brittle:
typescriptpage.getByRole("button", { name: "Save" }) // ✅ Good page.locator('button.save-btn') // ❌ Brittle - •
Don't fight animations - wait for them:
typescriptawait expect(dialog).not.toBeVisible();
- •
Verify URL changes after navigation:
typescriptawait page.waitForURL("**/datasets");