Codex Review Churn Analysis (PR Retrospective)

Overview

Explain why Codex Cloud review had to be re-run. Attribute causes to specific development stages, not just PR outcomes.

•Multiple @codex review cycles on the same PR.
•The same feedback appears across successive Codex reviews.
•Follow-up fix PRs exist shortly after merge.
•You need stage-level causes (design / implementation / testing / review packaging / release).

When NOT to use:

•Select: Find PRs with Codex review churn (comment -> code update -> Codex review).
•Isolate: Filter only Codex Cloud reviews and actionable feedback.
•Evidence: Collect a traceable chain (Codex feedback -> code change or follow-up fix).
•Classify: Assign primary + secondary stage causes.
•Abstract: Roll causes into a stable taxonomy.
•Aggregate: Summarize causes across frontend/backends.
•Prevent: Enforce the PR template risk-layer gate before any future @codex review.

Item	Rule
Codex cycle	Codex review comment -> code update -> new Codex review
Evidence	Codex comment + fix commit OR follow-up fix PR OR regression doc
Stage attribution	design / implementation / testing / review packaging / release
Mixed PR	record both frontend and backend impact
Noise guard	if Codex comments are generic, mark low-signal
Risk-layer gate	if any trigger matches, fill the addendum before @codex review

•Design: missing requirements, unclear acceptance criteria, privacy/exposure gaps, cross-endpoint invariants not specified.
•Implementation: logic errors, incomplete edge cases, inconsistent ordering/aggregation.
•Testing: missing regression/E2E/contract tests, no reproduction script.
•Review Packaging: PR lacks context, spec, evidence, or minimal repro for Codex to review well.
•Release/Integration: environment constraints (gateway, permissions, paths), deploy-time mismatches.

•Spec Gap: requirement or invariant not defined.
•Context Gap: Codex lacked PR context (spec, expected behavior, tests).
•Implementation Drift: code diverged from intent or was inconsistent across modules.
•Test Gap: no automated proof for edge cases or invariants.
•Integration Constraint: environment or platform limitations discovered late.

Baseline candidate list (optional):

bash

node scripts/ops/pr-retro.cjs \
  --since YYYY-MM-DD \
  --min-cycles 3 \
  --limit 5 \
  --out-dir docs/retrospective \
  --max-prs 80

Then isolate Codex Cloud feedback per PR:

bash

gh pr view <num> --json reviews,comments --jq '(.reviews + .comments) | map(select(.author.login == "chatgpt-codex-connector"))'

Before requesting @codex review, check .github/PULL_REQUEST_TEMPLATE.md:

•If any Risk Layer Trigger is checked, you MUST fill the Risk Layer Addendum.
•Provide rules/invariants, boundary matrix (>=3), and evidence (tests or repro).
•Summarize these items in Codex Context so Codex sees the delta clearly.

•Each PR must cite at least 1 evidence item.
•If Codex feedback is generic, mark low-signal and rely on follow-up fixes or PR gate docs.
•Do not infer root cause without a traceable artifact.

Excuse	Reality
"Codex asked again, so it's Codex's fault"	Repeated reviews usually reflect missing context or gaps in our stages.
"Review cycles are enough"	Cycles show churn, not cause. Evidence is required.
"Titles explain the issue"	Titles are symptoms, not root causes.
"We fixed it later, so root cause is obvious"	Fixes show symptom; stage attribution still required.
"Template is optional"	Risk-layer addendum is mandatory when any trigger matches.