BDD Test Solution Audit
Goal: evaluate specification executability, flake resistance, maintainability, semantic/a11y quality, and AI-agent operability.
Adaptive Workflow
Workflow adapts based on repository size (auto-detected).
┌─────────────────────────────────────────────────────────────────┐
│ 1. DISCOVER → 2. ANALYZE → 3. SCORE → 4. REPORT → 5. ROADMAP │
└─────────────────────────────────────────────────────────────────┘
↑ │
└──────────── Skip steps for small repos ──────────────┘
| Repo Size | Steps | Sampling | Questions |
|---|---|---|---|
| Small (≤20 scenarios) | 1→3→4 | None | 1 question |
| Medium (21–100) | 1→2→3→4→5 | 30–50% | 2 questions |
| Large (100+) | Full | Stratified | 3 questions |
Step 1: Discovery & Auto-Inference
Target: {argument OR cwd}
Auto-detect (no user input needed):
| What | How to Detect |
|---|---|
| Stack | playwright.config.* → Playwright; playwright-bdd in package.json → playwright-bdd |
| Size | Count *.feature files and Scenario: lines |
| History | Check .bddready/history/index.json exists |
| CI | Check .github/workflows/, Jenkinsfile, .gitlab-ci.yml |
| Artifacts | Check playwright.config.* for trace/video/screenshot settings |
Output immediately:
Target: {path}
Stack: {stack} (auto-detected)
Size: {small/medium/large} ({N} features, {M} scenarios)
History: {yes/no} | CI: {yes/no} | Artifacts: {configured/missing}
See modules/discovery.md for detailed detection rules.
Step 2: Sampling (Medium/Large repos only)
Skip for small repos — analyze all scenarios.
For medium/large repos, use stratified sampling. See modules/sampling.md.
Progress Indicator (Medium/Large repos)
For repositories with 50+ scenarios, show progress during analysis:
Analyzing BDD Test Solution... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% [■■■■■■■■■■░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 25% ✓ Discovery complete (playwright-bdd detected) → Analyzing features/auth/*.feature (8 scenarios) [■■■■■■■■■■■■■■■■■■■■░░░░░░░░░░░░░░░░░░░░] 50% → Analyzing features/checkout/*.feature (12 scenarios) [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■░░░░░░░░░░] 75% → Scoring aspects... [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 100% ✓ Analysis complete
Progress stages:
- •Discovery (10%)
- •Feature file analysis (10-70%, proportional to file count)
- •Step definition analysis (70-85%)
- •Scoring (85-95%)
- •Report generation (95-100%)
Update progress after each feature file or major step.
Step 3: Score Aspects
Score each aspect using rubrics from criteria/aspects.md.
Aspects and weights:
| # | Aspect | Weight |
|---|---|---|
| 1 | Executable Gherkin | 16% |
| 2 | Step Definitions Quality | 14% |
| 3 | Test Architecture | 14% |
| 4 | Selector Strategy | 12% |
| 5 | Waiting & Flake Resistance | 14% |
| 6 | Data & Environment | 10% |
| 7 | CI, Reporting & Artifacts | 10% |
| 8 | AI-Agent Operability | 10% |
Scoring: 0 (bad) / 5 (partial) / 10 (good) per criterion.
See modules/scoring.md for calculation formulas.
Step 4: Report
4.1 Terminal Output (Always)
Print ASCII dashboard with scores and issues. See modules/output-formats.md.
4.2 Issues by Severity
Classify using reference/severity.md:
- •🔴 CRITICAL — blocks reliable execution
- •🟡 WARNING — hinders speed/maintainability
- •🔵 INFO — optimizations
Every issue MUST have:
- •Evidence (file path, pattern, or code snippet)
- •Impact (why it matters)
- •Effort estimate (Low/Medium/High)
4.3 Save Reports
Save to .bddready/history/reports/:
- •
{REPORT_ID}.json— machine-readable - •
{REPORT_ID}.md— human-readable
Update .bddready/history/index.json for delta tracking.
4.4 HTML Report (Offer to User)
After showing terminal output, ask:
Would you like me to generate an interactive HTML report?
If yes, run:
node scripts/render-html.mjs .bddready/history/reports/{REPORT_ID}.json .bddready/history/reports/{REPORT_ID}.html
Interactive Fix Mode
After showing issues, offer to fix quick wins immediately.
Trigger Conditions
Offer interactive fixes when:
- •At least 1 CRITICAL issue with
Effort: Low - •Issue has clear, automatable fix pattern
Flow
╔══════════════════════════════════════════════════════════════════╗ ║ QUICK FIX AVAILABLE ║ ╠══════════════════════════════════════════════════════════════════╣ ║ [C1] Flake Resistance: Found 7 arbitrary sleeps ║ ║ Fix: Replace `wait X seconds` with condition waits ║ ║ Effort: Low | Files: 3 ║ ║ ║ ║ → Fix C1 now? [y/n/skip all] ║ ╚══════════════════════════════════════════════════════════════════╝
Response Handling
| Response | Action |
|---|---|
y / yes | Apply fix, show diff, continue to next fixable issue |
n / no | Skip this issue, continue to next |
skip all / s | Skip interactive mode, show full report |
Fixable Patterns
| Issue Pattern | Auto-Fix |
|---|---|
wait X seconds without condition | → waitFor with visibility/enabled check |
Hardcoded sleep() | → waitForSelector() or waitForResponse() |
| CSS class selectors | → getByRole() / getByTestId() (suggest, confirm) |
Missing trace: 'on-first-retry' | → Add to playwright.config |
| Duplicate step definitions | → Consolidate (show which to keep) |
After Each Fix
✓ Fixed C1: Replaced 7 sleeps with condition waits Modified: features/checkout.feature, features/auth.feature → Fix C2 now? [y/n/skip all]
Post-Fix Summary
╔══════════════════════════════════════════════════════════════════╗ ║ FIX SUMMARY ║ ╠══════════════════════════════════════════════════════════════════╣ ║ ✓ C1: Fixed (7 sleeps → condition waits) ║ ║ ✓ C3: Fixed (added trace-on-failure) ║ ║ ✗ C2: Skipped (requires manual review) ║ ║ ║ ║ Files modified: 5 ║ ║ New score estimate: 68 → 74 (+6) ║ ╚══════════════════════════════════════════════════════════════════╝
Step 5: Roadmap (Medium/Large repos only)
Skip for small repos — provide inline recommendations instead.
| Phase | Focus |
|---|---|
| 1: Quick Wins | Remove sleeps, enable trace-on-failure, fix critical selectors |
| 2: Foundation | Thin step defs, proper fixtures, test isolation |
| 3: Advanced | Visual tests, a11y integration, CI optimization |
User Questions
Auto-Inference First
Before asking, try to infer from codebase:
| Question | Auto-Inference |
|---|---|
| Primary goal? | Infer from issues: many sleeps → stability; bad selectors → AI-ready |
| Depth of changes? | Infer from repo size: small → quick wins; large → phased |
| CI constraints? | Read from config: worker count, timeout settings |
Minimal Question Set
Ask ONLY what cannot be inferred:
Small repos (1 question):
What is your priority: stability, speed, or AI-agent readability?
Medium repos (2 questions):
- •What is your priority: stability, speed, or AI-agent readability?
- •How deep can changes go: quick fixes only, or can we refactor?
Large repos (3 questions):
- •What is your priority: stability, speed, or AI-agent readability?
- •How deep can changes go: quick fixes only, medium refactor, or deep restructuring?
- •Are there CI/environment constraints? (e.g., worker limits, no mocks, staging only)
Dynamic Questions (Only if triggered)
Ask ONLY if specific issues found:
| Trigger | Question |
|---|---|
| CRITICAL issues found | Which CRITICAL items should be fixed first? (list by ID) |
| Selector/a11y issues | Can we modify application markup (HTML), or tests only? |
| >10 WARNING issues | Which WARNING items are in scope this iteration? |
Semantic/A11y Refactoring Proposal
If Aspect 4 (Selector Strategy) or Aspect 8 (AI-Agent Operability) scores below 60, propose:
╔══════════════════════════════════════════════════════════════════╗ ║ SEMANTIC/A11Y REFACTORING PROPOSAL ║ ╠══════════════════════════════════════════════════════════════════╣ ║ Your locators would be more stable with semantic HTML. ║ ║ ║ ║ Would you like me to help refactor: ║ ║ [ ] Component markup (replace div onclick → button, add ARIA) ║ ║ [ ] Test locators (migrate CSS → getByRole) ║ ╚══════════════════════════════════════════════════════════════════╝
Ask only if user has access to modify application source code.
Reference Files
| File | Purpose |
|---|---|
criteria/aspects.md | Detailed scoring rubrics (0/5/10) |
reference/severity.md | Issue classification rules |
reference/bdd-best-practices.md | Best practices guide |
modules/discovery.md | Discovery details |
modules/sampling.md | Sampling strategy |
modules/scoring.md | Score calculation |
modules/output-formats.md | Output format specs |
templates/report.html | HTML report template |
scripts/render-html.mjs | HTML generator script |
Quick Reference: Workflow by Size
Small Repo (≤20 scenarios)
- •Discover (auto-detect stack, size)
- •Ask 1 question (priority)
- •Analyze all scenarios
- •Score aspects (simplified)
- •Print terminal report + issues
- •Interactive fix mode (if Low-effort CRITICAL issues)
- •Offer HTML report
- •Provide inline recommendations
Medium Repo (21–100 scenarios)
- •Discover (auto-detect)
- •Ask 2 questions
- •Sample 30–50%
- •Show progress (50+ scenarios)
- •Full aspect scoring
- •Terminal + saved reports
- •Interactive fix mode
- •Offer HTML report
- •Phased roadmap (3 phases)
Large Repo (100+ scenarios)
- •Discover (auto-detect)
- •Ask 3 questions
- •Stratified sampling
- •Show progress (with stage updates)
- •Full aspect scoring
- •All report formats
- •Interactive fix mode
- •HTML report
- •Detailed phased roadmap
- •Propose a11y refactoring if applicable