AgentSkillsCN

Penant Sim Calibration Checks

Penant Sim校准检查

SKILL.md

name: penant-sim-calibration-checks description: Verify and calibrate PenantSimLite simulation behavior for roster auto-registration, CPU usage policy, growth/decline/non-tender/retirement logic, age distribution realism, and league-wide run environment balance. Use when pitching outcomes inflate, when age makeup drifts away from MLB-like ranges, or when changing gameplay engine and season simulation logic.

Penant Sim Calibration Checks

Run baseline checks

  • Run scripts/run_balance_checks.sh.
  • Read AUTO_SIM_SUMMARY and keep the full output as the before-state.

Mandatory logic scope

  • Treat these as one bundle when tuning:
    • Growth and decline logic.
    • Non-tender ordering logic.
    • Voluntary/natural retirement logic.
  • Keep "some elite players can remain active around age 40" as a hard requirement.

Validate roster and usage behavior

  • Confirm auto roster still enforces 26 total, 12 pitchers, 14 hitters.
  • Confirm auto roster selection prioritizes high totalValue players when auto-managed.
  • Confirm user usage settings are preserved:
    • Replace only injured/unavailable players.
    • Restore recovered players to preferred usage slots when possible.

Validate CPU team-management behavior

  • Confirm CPU non-tender ordering:
    • Older players are considered first.
    • Among similar ages, players with lower season playing time are cut first.
  • Confirm CPU game usage policy:
    • Higher-overall hitters are prioritized in lineup selection.
    • Higher-overall pitchers are prioritized for starter/late-inning usage.

Calibrate league run environment

  • Use references/target-ranges.md as the target band.
  • If ERA is too high:
    • Increase pitcher-side run prevention before nerfing all offense globally.
    • Prioritize command/walk tuning, contact-management effects, and weakest-pitcher usage.
  • If ERA is too low:
    • Revert in small steps; avoid large one-shot buffs to offense.

30-year age benchmark check

  • Run a long simulation (AUTO_SIM_YEARS=30) and review:
    • Qualified-player age distribution against MLB benchmark bands.
    • Year-to-year volatility for key indicators (OPS, WHIP, K%).
    • Maximum qualified age; verify rare but recurring elite near age 40.
    • Starter regulars by age and average overall (<=24, 25-29, 30-34, 35+).

Title-holder sanity check

  • Record and inspect annual title-holder lines (AVG/HR/OPS/ERA/SO/W/SV).
  • Flag seasons where title lines are obviously unrealistic (extreme outliers).
  • Also inspect strikeout-leader K/9 to avoid inflated usage artifacts.

Draft and Usage diagnostics

  • Check draft pool composition:
    • High school share is not too low.
    • Gap between top and 2nd candidate is not extreme.
  • Check starter usage:
    • No 中2日 equivalent usage in normal settings.
    • Validate minimum and median game-gap between starts from logs.

Final validation

  • Run:
    • swift test --filter testAutoSimulationCalibrationProducesStableLeagueRanges
    • swift test --filter testAutoSimulationThirtyYearAgingVolatilityAndTitleHolderReport
    • swift test --filter testGameResultIsRecordedAfterGame
    • swift test --filter testDraftPoolIncludesRotationReadyPitchers
    • swift test --filter testDraftPoolIncludesRookieStarHitters
  • Report:
    • What changed.
    • Which metric moved and by how much (including age buckets).
    • Any remaining gap against target ranges and MLB-like age makeup.