Scraper QA Workflow
This skill implements a rigorous test-driven workflow for backend scraper/discovery/extraction systems. It ensures changes are verified against defined use cases before being marked complete.
When This Skill Activates
Triggers on work involving:
- •
packages/scrapers/src/agents/smart-discovery/- Discovery agent - •
packages/scrapers/src/agents/smart-dish-finder/- Dish extraction agent - •
packages/scrapers/src/platforms/- Platform adapters (Lieferando, UberEats, etc.) - •
packages/scrapers/src/cli/- CLI tools - •Search engine integration (Google, Bing, etc.)
- •Query caching, rate limiting, country detection
- •Puppeteer/browser automation
Workflow Overview
┌─────────────────────┐ ┌──────────────────────┐ ┌─────────────────────┐
│ 1. SCOPE ANALYSIS │────▶│ 2. USE CASE SETUP │────▶│ 3. IMPLEMENTATION │
│ (What's changing?) │ │ (Define test cases) │ │ (Code changes) │
└─────────────────────┘ └──────────────────────┘ └─────────────────────┘
│
┌─────────────────────┐ ┌──────────────────────┐ │
│ 5. FINAL REPORT │◀────│ 4. VERIFICATION │◀─────────────┘
│ (All tests pass) │ │ (Run all use cases) │
└─────────────────────┘ └──────────────────────┘
CRITICAL: Test-First Approach
BEFORE writing any code:
- •Analyze what's being changed
- •Define specific, executable use cases
- •Document expected outcomes
- •Get user acknowledgment of test plan
AFTER code changes:
- •Execute ALL defined use cases
- •Fix any failures
- •Re-run until 100% pass
- •Only then report success to user
Phase 1: Scope Analysis
Identify what's being changed:
## Scope Analysis: [Task Description] **Components Affected:** - [ ] SmartDiscoveryAgent - [ ] SmartDishFinderAgent - [ ] PuppeteerFetcher - [ ] Platform adapters (specify which) - [ ] Search engine pool - [ ] Query cache - [ ] Country detection - [ ] CLI tools - [ ] Config/types **Risk Level:** low / medium / high **Potential Impact:** - Discovery accuracy - Extraction quality - Performance/rate limits - Data integrity - Cross-country handling
Phase 2: Use Case Setup
MANDATORY: Define executable use cases BEFORE implementing.
Consult TESTING-MANUAL.md for existing use cases, then add task-specific ones.
Use Case Format
### TC-[MODULE]-[NUMBER]: [Test Case Name] **Component:** [Which agent/file] **Type:** unit / integration / dry-run / live **Preconditions:** [Setup required] **Test Command:** [Exact command to run] **Verification Steps:** 1. Step 1 2. Step 2 **Expected Result:** [Specific, measurable outcome] **Pass Criteria:** [How to determine pass/fail]
Required Test Categories
For any change, define tests in these categories:
- •
Build Verification (always required)
- •TypeScript compiles without errors
- •No import/type errors
- •
Unit Tests (for logic changes)
- •Function returns expected output
- •Edge cases handled
- •
Dry Run Tests (for scraper changes)
- •Run with
--dry-runflag - •Verify no database writes
- •Check log output for expected behavior
- •Run with
- •
Integration Tests (for multi-component changes)
- •Components interact correctly
- •Data flows through pipeline
- •
Live Tests (for critical paths, with care)
- •Small-scale real execution
- •Verify actual results in Firestore
Phase 3: Implementation
Now implement the changes, keeping use cases in mind.
During implementation:
- •Write code that can be tested
- •Add logging for verification
- •Handle error cases explicitly
Phase 4: Verification
Execute ALL defined use cases. Do NOT report to user until all pass.
Verification Workflow
# 1. Build verification (ALWAYS FIRST) cd planted-availability-db && pnpm build # 2. Dry run tests cd packages/scrapers && pnpm run local --dry-run -c ../../scraper-config-test.json # 3. Specific test scenarios (from use cases) # ... run each defined test case
Recording Results
For each use case:
| TC ID | Description | Status | Output/Evidence | |-------|-------------|--------|-----------------| | TC-DISC-001 | Country detection | PASS | Logs show "Using detected country (DE)" | | TC-DISC-002 | URL validation | PASS | Build succeeded, no type errors |
Failure Handling
If ANY test fails:
- •Analyze failure
- •Fix the issue
- •Re-run ALL tests (not just failed one)
- •Repeat until 100% pass
Phase 5: Final Report
Only after ALL use cases pass, generate final report:
## Test Report: [Task Name] **Date:** YYYY-MM-DD **Status:** ALL TESTS PASSED ### Use Cases Executed | TC ID | Description | Status | |-------|-------------|--------| | TC-XXX-001 | ... | PASS | | TC-XXX-002 | ... | PASS | ### Build Status - `pnpm build`: PASS - TypeScript errors: 0 ### Evidence [Key log outputs, screenshots, or data samples proving success] ### Changes Made - `file1.ts`: Description - `file2.ts`: Description
Test Commands Reference
# Build all packages cd planted-availability-db && pnpm build # Build scrapers only cd planted-availability-db/packages/scrapers && pnpm build # Dry run discovery (no DB writes) cd packages/scrapers && pnpm run local --dry-run -c ../../scraper-config.json # Run with specific config cd packages/scrapers && pnpm run local -c ../../scraper-config-test.json # Discovery only cd packages/scrapers && pnpm run discovery -c ../../scraper-config.json # Dish extraction only cd packages/scrapers && pnpm run extraction -c ../../scraper-config.json # Test search pool cd packages/scrapers && pnpm run search-pool # Interactive review cd packages/scrapers && pnpm run review
Test Config Files
Create minimal test configs for verification:
scraper-config-test.json (for quick tests):
{
"mode": "discovery",
"countries": ["DE"],
"platforms": ["lieferando"],
"maxQueries": 5,
"maxVenues": 3,
"batchCitySize": 1,
"extractDishesInline": false
}
Reference Documents
- •
TESTING-MANUAL.md- Full use case library - •
TEST-REPORT-TEMPLATE.md- Report template - •
planted-availability-db/.claude/skills/fixes-done.md- Previous bugs and fixes
Key Principle
User sees ONLY the final result:
- •If all tests pass: Report success with evidence
- •If tests fail: Fix issues, re-test, then report
- •Never report partial results or "try running this"