BDD Test Solution Audit

Goal: evaluate specification executability, flake resistance, maintainability, semantic/a11y quality, and AI-agent operability.

Adaptive Workflow

Workflow adapts based on repository size (auto-detected).

code

┌─────────────────────────────────────────────────────────────────┐
│ 1. DISCOVER → 2. ANALYZE → 3. SCORE → 4. REPORT → 5. ROADMAP   │
└─────────────────────────────────────────────────────────────────┘
     ↑                                                      │
     └──────────── Skip steps for small repos ──────────────┘

Repo Size	Steps	Sampling	Questions
Small (≤20 scenarios)	1→3→4	None	1 question
Medium (21–100)	1→2→3→4→5	30–50%	2 questions
Large (100+)	Full	Stratified	3 questions

Step 1: Discovery & Auto-Inference

Target: {argument OR cwd}

Auto-detect (no user input needed):

What	How to Detect
Stack	`playwright.config.*` → Playwright; `playwright-bdd` in package.json → playwright-bdd
Size	Count `*.feature` files and `Scenario:` lines
History	Check `.bddready/history/index.json` exists
CI	Check `.github/workflows/`, `Jenkinsfile`, `.gitlab-ci.yml`
Artifacts	Check `playwright.config.*` for trace/video/screenshot settings

Output immediately:

code

Target: {path}
Stack: {stack} (auto-detected)
Size: {small/medium/large} ({N} features, {M} scenarios)
History: {yes/no} | CI: {yes/no} | Artifacts: {configured/missing}

See modules/discovery.md for detailed detection rules.

Step 2: Sampling (Medium/Large repos only)

Skip for small repos — analyze all scenarios.

For medium/large repos, use stratified sampling. See modules/sampling.md.

Progress Indicator (Medium/Large repos)

For repositories with 50+ scenarios, show progress during analysis:

code

Analyzing BDD Test Solution...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0%

[■■■■■■■■■■░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 25%
✓ Discovery complete (playwright-bdd detected)
→ Analyzing features/auth/*.feature (8 scenarios)

[■■■■■■■■■■■■■■■■■■■■░░░░░░░░░░░░░░░░░░░░] 50%
→ Analyzing features/checkout/*.feature (12 scenarios)

[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■░░░░░░░░░░] 75%
→ Scoring aspects...

[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 100%
✓ Analysis complete

Progress stages:

•Discovery (10%)
•Feature file analysis (10-70%, proportional to file count)
•Step definition analysis (70-85%)
•Scoring (85-95%)
•Report generation (95-100%)

Update progress after each feature file or major step.

Step 3: Score Aspects

Score each aspect using rubrics from criteria/aspects.md.

Aspects and weights:

#	Aspect	Weight
1	Executable Gherkin	16%
2	Step Definitions Quality	14%
3	Test Architecture	14%
4	Selector Strategy	12%
5	Waiting & Flake Resistance	14%
6	Data & Environment	10%
7	CI, Reporting & Artifacts	10%
8	AI-Agent Operability	10%

Scoring: 0 (bad) / 5 (partial) / 10 (good) per criterion.

See modules/scoring.md for calculation formulas.

Step 4: Report

4.1 Terminal Output (Always)

Print ASCII dashboard with scores and issues. See modules/output-formats.md.

4.2 Issues by Severity

Classify using reference/severity.md:

•🔴 CRITICAL — blocks reliable execution
•🟡 WARNING — hinders speed/maintainability
•🔵 INFO — optimizations

Every issue MUST have:

•Evidence (file path, pattern, or code snippet)
•Impact (why it matters)
•Effort estimate (Low/Medium/High)

4.3 Save Reports

Save to .bddready/history/reports/:

•{REPORT_ID}.json — machine-readable
•{REPORT_ID}.md — human-readable

Update .bddready/history/index.json for delta tracking.

4.4 HTML Report (Offer to User)

After showing terminal output, ask:

Would you like me to generate an interactive HTML report?

If yes, run:

bash

node scripts/render-html.mjs .bddready/history/reports/{REPORT_ID}.json .bddready/history/reports/{REPORT_ID}.html

Interactive Fix Mode

After showing issues, offer to fix quick wins immediately.

Trigger Conditions

Offer interactive fixes when:

•At least 1 CRITICAL issue with Effort: Low
•Issue has clear, automatable fix pattern

Flow

code

╔══════════════════════════════════════════════════════════════════╗
║                     QUICK FIX AVAILABLE                          ║
╠══════════════════════════════════════════════════════════════════╣
║  [C1] Flake Resistance: Found 7 arbitrary sleeps                 ║
║       Fix: Replace `wait X seconds` with condition waits         ║
║       Effort: Low | Files: 3                                     ║
║                                                                  ║
║  → Fix C1 now? [y/n/skip all]                                    ║
╚══════════════════════════════════════════════════════════════════╝

Response Handling

Response	Action
`y` / `yes`	Apply fix, show diff, continue to next fixable issue
`n` / `no`	Skip this issue, continue to next
`skip all` / `s`	Skip interactive mode, show full report

Fixable Patterns

Issue Pattern	Auto-Fix
`wait X seconds` without condition	→ `waitFor` with visibility/enabled check
Hardcoded `sleep()`	→ `waitForSelector()` or `waitForResponse()`
CSS class selectors	→ `getByRole()` / `getByTestId()` (suggest, confirm)
Missing `trace: 'on-first-retry'`	→ Add to playwright.config
Duplicate step definitions	→ Consolidate (show which to keep)

After Each Fix

code

✓ Fixed C1: Replaced 7 sleeps with condition waits
  Modified: features/checkout.feature, features/auth.feature
  
→ Fix C2 now? [y/n/skip all]

Post-Fix Summary

code

╔══════════════════════════════════════════════════════════════════╗
║                     FIX SUMMARY                                  ║
╠══════════════════════════════════════════════════════════════════╣
║  ✓ C1: Fixed (7 sleeps → condition waits)                        ║
║  ✓ C3: Fixed (added trace-on-failure)                            ║
║  ✗ C2: Skipped (requires manual review)                          ║
║                                                                  ║
║  Files modified: 5                                               ║
║  New score estimate: 68 → 74 (+6)                                ║
╚══════════════════════════════════════════════════════════════════╝

Step 5: Roadmap (Medium/Large repos only)

Skip for small repos — provide inline recommendations instead.

Phase	Focus
1: Quick Wins	Remove sleeps, enable trace-on-failure, fix critical selectors
2: Foundation	Thin step defs, proper fixtures, test isolation
3: Advanced	Visual tests, a11y integration, CI optimization

User Questions

Auto-Inference First

Before asking, try to infer from codebase:

Question	Auto-Inference
Primary goal?	Infer from issues: many sleeps → stability; bad selectors → AI-ready
Depth of changes?	Infer from repo size: small → quick wins; large → phased
CI constraints?	Read from config: worker count, timeout settings

Minimal Question Set

Ask ONLY what cannot be inferred:

Small repos (1 question):

What is your priority: stability, speed, or AI-agent readability?

Medium repos (2 questions):

•What is your priority: stability, speed, or AI-agent readability?
•How deep can changes go: quick fixes only, or can we refactor?

Large repos (3 questions):

•What is your priority: stability, speed, or AI-agent readability?
•How deep can changes go: quick fixes only, medium refactor, or deep restructuring?
•Are there CI/environment constraints? (e.g., worker limits, no mocks, staging only)

Dynamic Questions (Only if triggered)

Ask ONLY if specific issues found:

Trigger	Question
CRITICAL issues found	Which CRITICAL items should be fixed first? (list by ID)
Selector/a11y issues	Can we modify application markup (HTML), or tests only?
>10 WARNING issues	Which WARNING items are in scope this iteration?

Semantic/A11y Refactoring Proposal

If Aspect 4 (Selector Strategy) or Aspect 8 (AI-Agent Operability) scores below 60, propose:

code

╔══════════════════════════════════════════════════════════════════╗
║           SEMANTIC/A11Y REFACTORING PROPOSAL                     ║
╠══════════════════════════════════════════════════════════════════╣
║  Your locators would be more stable with semantic HTML.          ║
║                                                                  ║
║  Would you like me to help refactor:                             ║
║  [ ] Component markup (replace div onclick → button, add ARIA)   ║
║  [ ] Test locators (migrate CSS → getByRole)                     ║
╚══════════════════════════════════════════════════════════════════╝

Ask only if user has access to modify application source code.

Reference Files

File	Purpose
`criteria/aspects.md`	Detailed scoring rubrics (0/5/10)
`reference/severity.md`	Issue classification rules
`reference/bdd-best-practices.md`	Best practices guide
`modules/discovery.md`	Discovery details
`modules/sampling.md`	Sampling strategy
`modules/scoring.md`	Score calculation
`modules/output-formats.md`	Output format specs
`templates/report.html`	HTML report template
`scripts/render-html.mjs`	HTML generator script

Quick Reference: Workflow by Size

Small Repo (≤20 scenarios)

•Discover (auto-detect stack, size)
•Ask 1 question (priority)
•Analyze all scenarios
•Score aspects (simplified)
•Print terminal report + issues
•Interactive fix mode (if Low-effort CRITICAL issues)
•Offer HTML report
•Provide inline recommendations

Medium Repo (21–100 scenarios)

•Discover (auto-detect)
•Ask 2 questions
•Sample 30–50%
•Show progress (50+ scenarios)
•Full aspect scoring
•Terminal + saved reports
•Interactive fix mode
•Offer HTML report
•Phased roadmap (3 phases)

Large Repo (100+ scenarios)

•Discover (auto-detect)
•Ask 3 questions
•Stratified sampling
•Show progress (with stage updates)
•Full aspect scoring
•All report formats
•Interactive fix mode
•HTML report
•Detailed phased roadmap
•Propose a11y refactoring if applicable