AgentSkillsCN

auditing-bdd-tests

针对BDD(Gherkin)与Playwright测试方案,从规范质量、抗干扰性、语义/无障碍定位器,以及AI代理的可操作性等多个维度进行分析,生成各方面的评分、等级评定、按严重程度划分的问题清单,以及优化改进路线图。

SKILL.md
--- frontmatter
name: auditing-bdd-tests
description: Analyzes BDD (Gherkin) + Playwright test solutions for spec quality, flake resistance, semantic/a11y locators, and AI-agent operability. Produces aspect scoring, grade, issues by severity, and improvement roadmap.
user-invocable: true
argument-hint: [path-to-repo]

BDD Test Solution Audit

Goal: evaluate specification executability, flake resistance, maintainability, semantic/a11y quality, and AI-agent operability.

Adaptive Workflow

Workflow adapts based on repository size (auto-detected).

code
┌─────────────────────────────────────────────────────────────────┐
│ 1. DISCOVER → 2. ANALYZE → 3. SCORE → 4. REPORT → 5. ROADMAP   │
└─────────────────────────────────────────────────────────────────┘
     ↑                                                      │
     └──────────── Skip steps for small repos ──────────────┘
Repo SizeStepsSamplingQuestions
Small (≤20 scenarios)1→3→4None1 question
Medium (21–100)1→2→3→4→530–50%2 questions
Large (100+)FullStratified3 questions

Step 1: Discovery & Auto-Inference

Target: {argument OR cwd}

Auto-detect (no user input needed):

WhatHow to Detect
Stackplaywright.config.* → Playwright; playwright-bdd in package.json → playwright-bdd
SizeCount *.feature files and Scenario: lines
HistoryCheck .bddready/history/index.json exists
CICheck .github/workflows/, Jenkinsfile, .gitlab-ci.yml
ArtifactsCheck playwright.config.* for trace/video/screenshot settings

Output immediately:

code
Target: {path}
Stack: {stack} (auto-detected)
Size: {small/medium/large} ({N} features, {M} scenarios)
History: {yes/no} | CI: {yes/no} | Artifacts: {configured/missing}

See modules/discovery.md for detailed detection rules.


Step 2: Sampling (Medium/Large repos only)

Skip for small repos — analyze all scenarios.

For medium/large repos, use stratified sampling. See modules/sampling.md.


Progress Indicator (Medium/Large repos)

For repositories with 50+ scenarios, show progress during analysis:

code
Analyzing BDD Test Solution...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0%

[■■■■■■■■■■░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 25%
✓ Discovery complete (playwright-bdd detected)
→ Analyzing features/auth/*.feature (8 scenarios)

[■■■■■■■■■■■■■■■■■■■■░░░░░░░░░░░░░░░░░░░░] 50%
→ Analyzing features/checkout/*.feature (12 scenarios)

[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■░░░░░░░░░░] 75%
→ Scoring aspects...

[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 100%
✓ Analysis complete

Progress stages:

  1. Discovery (10%)
  2. Feature file analysis (10-70%, proportional to file count)
  3. Step definition analysis (70-85%)
  4. Scoring (85-95%)
  5. Report generation (95-100%)

Update progress after each feature file or major step.


Step 3: Score Aspects

Score each aspect using rubrics from criteria/aspects.md.

Aspects and weights:

#AspectWeight
1Executable Gherkin16%
2Step Definitions Quality14%
3Test Architecture14%
4Selector Strategy12%
5Waiting & Flake Resistance14%
6Data & Environment10%
7CI, Reporting & Artifacts10%
8AI-Agent Operability10%

Scoring: 0 (bad) / 5 (partial) / 10 (good) per criterion.

See modules/scoring.md for calculation formulas.


Step 4: Report

4.1 Terminal Output (Always)

Print ASCII dashboard with scores and issues. See modules/output-formats.md.

4.2 Issues by Severity

Classify using reference/severity.md:

  • 🔴 CRITICAL — blocks reliable execution
  • 🟡 WARNING — hinders speed/maintainability
  • 🔵 INFO — optimizations

Every issue MUST have:

  • Evidence (file path, pattern, or code snippet)
  • Impact (why it matters)
  • Effort estimate (Low/Medium/High)

4.3 Save Reports

Save to .bddready/history/reports/:

  • {REPORT_ID}.json — machine-readable
  • {REPORT_ID}.md — human-readable

Update .bddready/history/index.json for delta tracking.

4.4 HTML Report (Offer to User)

After showing terminal output, ask:

Would you like me to generate an interactive HTML report?

If yes, run:

bash
node scripts/render-html.mjs .bddready/history/reports/{REPORT_ID}.json .bddready/history/reports/{REPORT_ID}.html

Interactive Fix Mode

After showing issues, offer to fix quick wins immediately.

Trigger Conditions

Offer interactive fixes when:

  • At least 1 CRITICAL issue with Effort: Low
  • Issue has clear, automatable fix pattern

Flow

code
╔══════════════════════════════════════════════════════════════════╗
║                     QUICK FIX AVAILABLE                          ║
╠══════════════════════════════════════════════════════════════════╣
║  [C1] Flake Resistance: Found 7 arbitrary sleeps                 ║
║       Fix: Replace `wait X seconds` with condition waits         ║
║       Effort: Low | Files: 3                                     ║
║                                                                  ║
║  → Fix C1 now? [y/n/skip all]                                    ║
╚══════════════════════════════════════════════════════════════════╝

Response Handling

ResponseAction
y / yesApply fix, show diff, continue to next fixable issue
n / noSkip this issue, continue to next
skip all / sSkip interactive mode, show full report

Fixable Patterns

Issue PatternAuto-Fix
wait X seconds without conditionwaitFor with visibility/enabled check
Hardcoded sleep()waitForSelector() or waitForResponse()
CSS class selectorsgetByRole() / getByTestId() (suggest, confirm)
Missing trace: 'on-first-retry'→ Add to playwright.config
Duplicate step definitions→ Consolidate (show which to keep)

After Each Fix

code
✓ Fixed C1: Replaced 7 sleeps with condition waits
  Modified: features/checkout.feature, features/auth.feature
  
→ Fix C2 now? [y/n/skip all]

Post-Fix Summary

code
╔══════════════════════════════════════════════════════════════════╗
║                     FIX SUMMARY                                  ║
╠══════════════════════════════════════════════════════════════════╣
║  ✓ C1: Fixed (7 sleeps → condition waits)                        ║
║  ✓ C3: Fixed (added trace-on-failure)                            ║
║  ✗ C2: Skipped (requires manual review)                          ║
║                                                                  ║
║  Files modified: 5                                               ║
║  New score estimate: 68 → 74 (+6)                                ║
╚══════════════════════════════════════════════════════════════════╝

Step 5: Roadmap (Medium/Large repos only)

Skip for small repos — provide inline recommendations instead.

PhaseFocus
1: Quick WinsRemove sleeps, enable trace-on-failure, fix critical selectors
2: FoundationThin step defs, proper fixtures, test isolation
3: AdvancedVisual tests, a11y integration, CI optimization

User Questions

Auto-Inference First

Before asking, try to infer from codebase:

QuestionAuto-Inference
Primary goal?Infer from issues: many sleeps → stability; bad selectors → AI-ready
Depth of changes?Infer from repo size: small → quick wins; large → phased
CI constraints?Read from config: worker count, timeout settings

Minimal Question Set

Ask ONLY what cannot be inferred:

Small repos (1 question):

What is your priority: stability, speed, or AI-agent readability?

Medium repos (2 questions):

  1. What is your priority: stability, speed, or AI-agent readability?
  2. How deep can changes go: quick fixes only, or can we refactor?

Large repos (3 questions):

  1. What is your priority: stability, speed, or AI-agent readability?
  2. How deep can changes go: quick fixes only, medium refactor, or deep restructuring?
  3. Are there CI/environment constraints? (e.g., worker limits, no mocks, staging only)

Dynamic Questions (Only if triggered)

Ask ONLY if specific issues found:

TriggerQuestion
CRITICAL issues foundWhich CRITICAL items should be fixed first? (list by ID)
Selector/a11y issuesCan we modify application markup (HTML), or tests only?
>10 WARNING issuesWhich WARNING items are in scope this iteration?

Semantic/A11y Refactoring Proposal

If Aspect 4 (Selector Strategy) or Aspect 8 (AI-Agent Operability) scores below 60, propose:

code
╔══════════════════════════════════════════════════════════════════╗
║           SEMANTIC/A11Y REFACTORING PROPOSAL                     ║
╠══════════════════════════════════════════════════════════════════╣
║  Your locators would be more stable with semantic HTML.          ║
║                                                                  ║
║  Would you like me to help refactor:                             ║
║  [ ] Component markup (replace div onclick → button, add ARIA)   ║
║  [ ] Test locators (migrate CSS → getByRole)                     ║
╚══════════════════════════════════════════════════════════════════╝

Ask only if user has access to modify application source code.


Reference Files

FilePurpose
criteria/aspects.mdDetailed scoring rubrics (0/5/10)
reference/severity.mdIssue classification rules
reference/bdd-best-practices.mdBest practices guide
modules/discovery.mdDiscovery details
modules/sampling.mdSampling strategy
modules/scoring.mdScore calculation
modules/output-formats.mdOutput format specs
templates/report.htmlHTML report template
scripts/render-html.mjsHTML generator script

Quick Reference: Workflow by Size

Small Repo (≤20 scenarios)

  1. Discover (auto-detect stack, size)
  2. Ask 1 question (priority)
  3. Analyze all scenarios
  4. Score aspects (simplified)
  5. Print terminal report + issues
  6. Interactive fix mode (if Low-effort CRITICAL issues)
  7. Offer HTML report
  8. Provide inline recommendations

Medium Repo (21–100 scenarios)

  1. Discover (auto-detect)
  2. Ask 2 questions
  3. Sample 30–50%
  4. Show progress (50+ scenarios)
  5. Full aspect scoring
  6. Terminal + saved reports
  7. Interactive fix mode
  8. Offer HTML report
  9. Phased roadmap (3 phases)

Large Repo (100+ scenarios)

  1. Discover (auto-detect)
  2. Ask 3 questions
  3. Stratified sampling
  4. Show progress (with stage updates)
  5. Full aspect scoring
  6. All report formats
  7. Interactive fix mode
  8. HTML report
  9. Detailed phased roadmap
  10. Propose a11y refactoring if applicable