Agent Testing Checklist
For GitHub Copilot Agents: Use this checklist to verify your work is complete before submitting PRs.
Critical Success Criteria
Every agent PR MUST include proof of working functionality. This is non-negotiable.
Definition of Done
- •
Working environment accessible
- •Payload: Screenshot of http://localhost:3000/admin showing login or dashboard
- •Legacy: Screenshot of http://localhost:8080 showing Y-Not homepage rendering
- •Both services respond to HTTP requests (not just "container started")
- •
Functionality proven
- •Can interact with the application (not just see an error page)
- •For Payload: Can access collections, see data, or create records
- •For Legacy: Pages render with content (not directory listings)
- •
Tests pass
- •
yarn testexits with code 0 - •
yarn lintexits with code 0 - •No new test failures introduced
- •
Performance Baselines
Know when to stop and report issues:
| Metric | Expected | Warning | Failure |
|---|---|---|---|
| Container startup | < 60s | 60-120s | > 120s |
| yarn install | < 120s | 120-300s | > 300s |
| Service ready (total) | < 180s | 180-360s | > 360s |
| Docker image pull | < 30s | 30-60s | > 60s |
⚠️ If you exceed "Warning" thresholds: Document the issue and investigate alternatives (pre-built images, optimized containers, etc.)
❌ If you hit "Failure" thresholds: Stop and report the blocker. Do not proceed with untested work.
Environment Detection
Detect your environment early and adjust expectations:
# Am I in CI/CD? if [ -n "$CI" ] || [ -n "$GITHUB_ACTIONS" ]; then # Use optimized workflow: pre-built images, skip heavy installs echo "CI environment detected" fi # Do I have network access? if ! curl -s https://registry.npmjs.org > /dev/null; then # Report network restrictions, recommend allowlist additions echo "Network restricted - cannot pull packages" fi # Can I bind to required ports? if lsof -ti:3000 > /dev/null 2>&1; then echo "Port 3000 in use - cannot start Payload" fi
Incremental Verification Strategy
Test each layer before building the next:
Phase 1: Infrastructure
- • Docker installed and running
- • Can pull base images:
docker pull node:22-alpine - • Can access npm registry:
yarn --version - • Required ports available: 3000, 8080, 5432
Phase 2: Build
- • Dockerfile syntax valid
- • Image builds successfully:
docker build -f Dockerfile.payload . - • Build completes in < 5 minutes
- • No build errors in logs
Phase 3: Service Startup
- • Containers start:
docker compose up -d - • Health checks pass
- • Services respond on localhost
- • Startup completes in < 3 minutes
Phase 4: Database Seeding (Optional but Recommended)
- • Legacy site:
yarn seed:legacy(sample data) or./bin/refresh_local.sh(production) - • Payload:
yarn seed:payload(sample data based on Y-Not structure) - • Data visible in applications
Phase 4: Application Access
- • HTTP requests succeed (200, not 500/502)
- • UI renders (not blank page or error)
- • Can interact with application
- • Screenshots prove functionality
Stop at each phase if failures occur. Document and report before proceeding.
Database Seeding
Why seed databases:
- •Empty applications are hard to test meaningfully
- •Screenshots of empty dashboards don't prove functionality
- •Seeded data helps verify relationships, queries, and UI work correctly
For Legacy PHP/MySQL Site
Quick seed (sample data): yarn seed:legacy
Creates minimal sample data for testing without requiring production database access.
Production data: ./bin/refresh_local.sh
This script:
- •Pulls latest production database snapshot (
./bin/pull_db.sh) - •Stops containers and removes volumes
- •Starts fresh containers
- •Imports production data into MySQL (
./bin/import_db.sh)
Usage:
# Quick seed with sample data (no production DB needed) yarn seed:legacy # Or use production data (requires DB access) ./bin/refresh_local.sh # Manual steps if needed: ./bin/pull_db.sh # Download latest DB ./bin/import_db.sh # Import into running container
Expected outcome:
- •Site at http://localhost:8080 shows content (sample or real)
- •PHPMyAdmin at http://localhost:8181 shows populated tables
- •Can browse shows, concerts, DJ profiles, etc.
Files:
- •Sample seed:
bin/seed-legacy.sh(in repo) - •Production dump:
src/db/docker/ynot_db.sql(gitignored, requires access) - •Import script:
bin/import_db.sh - •Refresh script:
bin/refresh_local.sh
For Payload CMS
Option 1: Use Pre-seeded Postgres Container (Fastest)
The pre-built Postgres image includes schema and sample data pre-installed:
# Using docker-compose (recommended) docker-compose up postgres # Or pull pre-built image directly docker pull ghcr.io/ynotradio/site/postgres-seeded:latest docker run -d -p 5432:5432 ghcr.io/ynotradio/site/postgres-seeded:latest
Performance:
- •✅ First start: ~2-3 minutes (seeds automatically)
- •✅ Subsequent starts: ~10 seconds (data persists)
- •⚠️ Requires GHCR access for pre-built image
Option 2: Manual Seeding
If you need custom data or don't have GHCR access:
# After Payload is running with empty database yarn seed:payload
What's included in seed data:
- •People (DJs like "Josh T. Landow", Artists)
- •Venues (The Foundry, Union Transfer, World Cafe Live)
- •Concerts with dates, artists, venues
- •Posts (news stories, contest announcements)
- •Shows (Top 11 @ 11, specialty shows)
- •Songs, Records, Artists (music catalog)
Expected outcome:
- •Admin UI shows populated collections
- •Can browse and edit sample data (DJs, venues, concerts, posts, shows)
- •Relationships work correctly (concerts → artists & venues, shows → DJs)
- •API returns data at endpoints
Files:
- •Pre-seeded image:
bin/docker/postgres/Dockerfile - •Seed script:
bin/seed-payload.ts(TypeScript, uses Payload API) - •Based on structure from
src/db/docker/ynot_db.sql
Connection details:
DATABASE_URI=postgresql://ynot_postgres_user:ynot_postgres_pass@localhost:5432/ynot_payload_dev DATABASE_SSL=disable
Seeding Checklist
- • Know which system you're testing (Payload, Legacy, or both)
- • Understand data dependencies (collections, relationships)
- • Run appropriate seed script
- • Verify data appears in UI
- • Take screenshots showing populated application
- • Test functionality with real-ish data
Fallback Strategy
When full automation fails, provide partial success:
If Containers Start But Are Slow
## Status: Partial Success ✅ **What Works:** - Docker images build successfully - Containers start and run - Services accessible on localhost ⚠️ **Performance Issues:** - yarn install takes 5+ minutes (timeout) - Total startup: 8 minutes (expected < 3 minutes) 📋 **Recommendations:** - Use pre-built Docker images - Implement layer caching - Switch to Debian base (faster than Alpine) 📸 **Evidence:** [Screenshots showing services eventually work]
If Services Don't Start
## Status: Infrastructure Ready, Runtime Blocked ✅ **What Works:** - Docker images build - Configuration files valid - Scripts execute without syntax errors ❌ **Blockers:** - Port 3000 unavailable - PostgreSQL connection fails - yarn install times out 📋 **Next Steps:** 1. [Specific actions to unblock] 2. [Manual testing instructions for humans] 3. [Alternative approaches to try]
If Can't Test at All
## Status: Code Complete, Testing Environment Unavailable ✅ **Code Quality:** - Syntax valid - Linting passes - Follows established patterns ❌ **Cannot Verify:** - No Docker access in environment - Network restrictions prevent package install - Resource constraints prevent startup 📋 **Manual Testing Required:** 1. Steps for human to test locally 2. Expected outcomes 3. How to verify functionality 🔧 **Environment Needs:** - [List specific requirements] - [Allowlist domains needed] - [Resource requirements]
Common Pitfalls
❌ DON'T: Submit Without Proof
"The scripts are production-ready and will work in real environments."
This is not acceptable. Provide proof or explain why you cannot.
✅ DO: Provide Evidence or Explain Why Not
"The scripts work on local workstations (see manual test results). Cannot test in CI due to yarn install timeout (5+ min). Recommend pre-built images for CI automation. See alternatives in [link]."
❌ DON'T: Fake Screenshots
Never include generic/placeholder screenshots that don't show actual functionality.
✅ DO: Show Real State
Show what actually works, even if incomplete:
- •Loading spinner if that's as far as it gets
- •Error message if that's what appears
- •Actual data if it works
❌ DON'T: Ignore Performance
# Still running after 10 minutes # "It will finish eventually"
✅ DO: Set Reasonable Timeouts
timeout 180 bash -c 'until service_ready; do sleep 5; done' if [ $? -eq 124 ]; then echo "❌ Service failed to start in 3 minutes" exit 1 fi
PR Template Checklist
Include this section in every PR:
## Agent Verification Results ### Environment - [ ] Local workstation / [ ] CI/CD - [ ] Full network access / [ ] Restricted network - Node version: [version] - Docker version: [version] ### Testing Performed - [ ] Payload accessible: http://localhost:3000/admin - [ ] Legacy accessible: http://localhost:8080 - [ ] Database seeded (`yarn seed:payload` and/or `yarn seed:legacy`) - [ ] Tests pass: `yarn test` - [ ] Linting passes: `yarn lint` ### Evidence [Screenshots or explain why not available] ### Performance Metrics - Container startup: [time] - yarn install: [time] - Service ready: [time] ### Issues Encountered [None / List specific issues and how resolved] ### Recommendations [Any suggestions for improving agent testing workflow]
When to Ask for Help
Ask for human assistance when:
- •Can't meet success criteria after reasonable attempts
- •Performance far exceeds thresholds without clear solution
- •Environment blockers require infrastructure changes
- •Conflicting requirements need human decision
- •Security concerns about testing approach
Asking for help is better than submitting unverified work.
Resources
- •Database seeding:
- •Payload:
yarn seed:payload(sample data based on ynot_db.sql structure) - •Legacy (sample):
yarn seed:legacy(quick test data) - •Legacy (production):
./bin/refresh_local.sh(real data, requires access)
- •Payload:
- •Local setup:
docs/LOCAL_SETUP_GUIDE.md - •Agent examples:
docs/AGENT_VERIFICATION_EXAMPLES.md - •Migration context:
docs/payload-migration/README.md - •Automation status: See the
agent-automation-infrastructureskill
Remember: Quality and honesty > Speed. Partial success with evidence > Untested code.