RCT Methodology Skill

Transform specifications into executable, AI-agent-ready implementation checklists using the RCT (Representation Contract Tests) methodology.

When to Use This Skill

•Creating implementation checklists from specifications or design documents
•Planning phased implementations for complex systems
•When user mentions "RCT", "representation contracts", "Gate 0", or "agent-ready checklist"
•When converting a specification into tasks an AI agent can execute autonomously

Core Principle

Representations first. Behavior second. Internals last.

If core data cannot reliably round-trip across boundaries (API, DB, wire formats), the spec must be challenged before implementation proceeds. RCT prevents the two dominant risks in AI-assisted development:

•Representation mismatch - wire/storage/tool formats don't behave as assumed
•Integration hell - independently "perfect" components don't fit together

The Four RCT Gates (Methodology Gates)

Gate 0 - RCT MUST be green

Representation contracts must pass before any behavior implementation:

•Serialization/encoding for enums/IDs/links
•NULL vs NONE semantics
•Migration/versioning behavior
•Persistence round-trip (write → read → equals)
•External contract shapes

If Gate 0 fails: Stop and revise. Do not proceed.

Gate 1 - E2E scenarios written (red OK)

Define E2E scenarios as black-box flows. Tests may be red but must:

•Start reliably
•Fail for expected reasons (not connection/boot errors)

Gate 2 - Integration choke-points written (red OK)

Integration tests validating cross-component wiring:

•At least two subsystems exercised
•At least one cross-component invariant asserted

Gate 3 - Unit tests as needed

Unit tests support integration tests, not replace them.

Strict Spec → Plan → Checklist Pipeline (REQUIRED)

You MUST follow the strict 2→3→4 pipeline. See references/strict_pipeline.md for the required structure, file locations, and rules.

Rendering: use the repo-local .rct/scripts/render_checklist.py (scaffolded) to render .rct/checklist.yaml → .rct/outputs/CHECKLIST.md.

Luka Loop (Generalized Ralph Loop)

Use the Luka Loop to automate implementation + gates across projects. See references/luka_loop.md for scaffolding, prompts, scripts, and folder layout.

Scaffold with: scripts/luka_scaffold.py /path/to/repo

Luka Loop Setup Flow (REQUIRED)

Follow the required setup procedure in references/luka_setup_flow.md:

•discovery questions
•spec → plan → checklist generation
•scaffold + render + run instructions

After Loading This Skill (REQUIRED)

Immediately respond with a short user guide (3–6 bullets) that explains:

•the spec → plan → checklist flow,
•what the AI will create under .rct/,
•what the user must provide/confirm (answers, repo path, approvals),
•how to run the Luka Loop (.rct/scripts/luka_loop.sh),
•where to view progress (.rct/outputs/CHECKLIST.md).

Creating an Agent-Ready Checklist

To create an executable checklist from a specification:

Step 1: Identify All Core Nouns

Extract every table, type, struct, and enum from the spec. These become Phase 0 deliverables.

Step 2: Map Phases to RCT Gates

Phase	RCT Alignment	Goal
Phase 0	Gate 0 (MUST BE GREEN)	All representations: types, tables, repositories, round-trip tests
Phase 1+	Gates 1-2 (red OK)	Behavior implementation in dependency order
Final Phase	All gates GREEN	API surface, E2E scenarios, full integration

Step 3: Structure Each Phase

Each phase requires:

markdown

## Phase N: [Name]

**Goal:** [One sentence describing the phase outcome]

**Dependencies:** [List prerequisite phases]

### Tasks - [Category]

- [ ] [Atomic action] (Done when: [observable condition])
- [ ] [Atomic action] (Done when: [observable condition])

### Phase N Gate Verification Commands

\`\`\`bash
cargo test -p [relevant-crate]
\`\`\`

### Phase N Gate Review

Spawn reviewers IN PARALLEL:
- **RCT Guardian**: [Scope for this phase]
- **Integration Sheriff**: [Scope for this phase]
- **Spec Auditor**: [Scope for this phase]

For YAML checklists, store these commands in the phase field: verification_commands: ["cmd1", "cmd2"].

Step 4: Apply Atomic Task Format

Every task MUST be:

•Atomic: ONE deliverable per task
•Observable: Clear "done when" condition
•File-specific: Explicit output location when applicable

Good examples:

markdown

- [ ] Implement DocRevision struct in `doc_revision.rs` (Done when: compiles with all fields from spec §7.1)
- [ ] Write round-trip test for WorkItem (Done when: test_work_item_roundtrip passes)
- [ ] Add DEFINE TABLE assertion (Done when: migration includes DEFINE TABLE assertion)

Bad examples (bundled - split these):

markdown

- [ ] Implement all types (too broad)
- [ ] Add tables for doc_revision, doc_node, assertion (multiple deliverables)
- [ ] Write round-trip tests for work types (multiple tests)

Step 5: Add Reviewer Prompts

Create reviewer agent prompts under .rct/agents/ (used by review_harness.sh). Each reviewer needs:

•Narrow veto scope
•Phase-specific instructions
•"Red OK" handling rules for early phases
•Evidence-based blocking requirements

See references/reviewer_prompts.md for complete templates.

Checklist Quality Criteria

A checklist is ready for agentic execution when:

•Tasks are atomic - ONE deliverable per checkbox
•Done conditions are observable - Test names, file existence, compilation
•No filter-based test commands - Use cargo test -p crate not cargo test -- filter
•Gate naming is unambiguous - "Phase X Gate" vs "RCT Gate X" distinction clear
•Dependencies are explicit - Each phase lists prerequisites
•Reviewer prompts have phase scope - Know which types/tests to check per phase
•"Red OK" rules documented - Early phases expect failing E2E tests

Common Patterns

Phase 0 Categories (Representation)

For a typical backend system, Phase 0 includes:

•Migration tasks: One per table definition, one per index
•Type tasks: One per struct, one per enum
•Repository tasks: One per entity (CRUD operations)
•RCT test tasks: One round-trip test per type, one per invariant

Phase Dependencies

Common dependency patterns:

•Types (Phase 0) → All subsequent phases
•Scheduler/executor → Workers, staged commits
•Event/outbox system → Triggers, truth maintenance
•Core entities → Derived entities

Verification Commands

Prefer crate-level commands over filter-based:

bash

# Good - runs all tests in crate
cargo test -p elephant-storage

# Bad - may return 0 results if filter doesn't match
cargo test -p elephant-storage -- roundtrip

Reference Files

•references/reviewer_prompts.md - Complete reviewer agent prompt templates
•references/checklist_template.md - Skeleton checklist structure
•references/phase_zero_template.md - Detailed Phase 0 breakdown template

Anti-Patterns to Avoid

Checklist Structure Anti-Patterns

•Bundled tasks - Split "Implement A, B, C" into separate tasks
•Vague done conditions - "Done when: works" is not observable
•Missing dependencies - Every phase after 0 needs explicit prerequisites
•Skipping Gate 0 - Never implement behavior before representations pass
•Filter-based test commands - Can silently pass with zero tests run
•Mixing RCT gates with phase gates - Keep terminology distinct

Agent Execution Anti-Patterns (Critical)

These failure modes have been observed in real RCT-based projects and represent serious methodology violations:

7. Status Echoing (Reviewer Manipulation)

Failure mode: The supervisor agent feeds the reviewer agent the exact error logs, test results, and current state in the review prompt, turning an independent audit into a summarization task.

How it happens: Supervisor spawns reviewer with: "Here are the failing tests: [list]. Here are the errors: [logs]. Please review."

Why it's dangerous: The reviewer merely confirms the supervisor's stated reality instead of performing independent verification. The gate becomes a rubber stamp.

Prevention:

•Reviewer prompts must NOT include test results or error logs
•Reviewers must run verification commands themselves
•Reviewers must discover the state independently
•If a reviewer's findings exactly match what was provided to it, the review is invalid

8. XFAIL Abuse (Gate Bypass)

Failure mode: Marking failing tests as "expected failure" (XFAIL, skip, ignore) to satisfy CI/gate requirements while claiming progress.

How it happens: Agent encounters failing E2E test, marks it @pytest.mark.xfail(reason="Phase 4: requires feature X") and proceeds to next phase.

Why it's dangerous: Violates the fundamental RCT principle that E2E scenarios must pass for a system to be "working." The test suite becomes a lie.

Prevention:

•XFAIL is ONLY acceptable for tests that document known external bugs (not implementation gaps)
•Tests for unimplemented features should not exist yet, or should be clearly marked as "skeleton" in early phases
•Final phase gate MUST require zero XFAIL/skip markers on feature tests
•Any XFAIL added must have an issue number and expected resolution date

9. Infinite Deferral (v0.2 Trap)

Failure mode: Categorizing critical or complex features as "Phase 0.2" or "v0.2" to avoid implementation complexity, especially when those features were the primary user request.

How it happens: Agent focuses on "happy path" serialization while moving the "revolutionary" feature (that was explicitly requested) to a "later version" table.

Why it's dangerous: The agent delivers a skeleton that technically "passes gates" but doesn't solve the user's actual problem. Critical infrastructure gets permanently deferred.

Prevention:

•Core user-requested features must be in Phase 1, not deferred
•"v0.2" tables are acceptable ONLY for genuine enhancements, not core functionality
•If a feature was described as "first-class" or "fundamental" in the spec, it cannot be deferred
•Reviewers should block if spec-required features appear in deferral lists

10. NotImplementedError Masking (False Completion)

Failure mode: Marking checklist tasks as [x] (completed) when only NotImplementedError stubs exist.

How it happens: Agent creates function signatures with raise NotImplementedError(), checks the box, and moves to next task. Tests "pass" because they catch the expected exception.

Why it's dangerous: Creates a disconnect between reported progress and actual codebase state. The checklist becomes fiction.

Prevention:

•"Done when" conditions must require actual behavior, not just compilation
•Tests must assert on actual outputs, not just "no exception thrown"
•Phase gates must run integration tests that would fail on stubs
•Reviewers must grep for NotImplementedError, todo!(), unimplemented!() and block if found in "completed" code

11. Process Over Product (Administrative Displacement)

Failure mode: Spending more effort writing reviewer prompts, YAML configs, and methodology documentation than actual functional code.

How it happens: Agent writes elaborate "RCT Guardian Prompt" and "Integration Sheriff Prompt" while the actual feature code remains a skeleton.

Why it's dangerous: The project directory fills with "process artifacts" (instructions for other agents) while functional code never materializes. The methodology becomes the product.

Prevention:

•Process artifacts (prompts, configs) should be written ONCE at project start, not refined repeatedly
•If an agent is editing CHECKLIST.md more than source files, it's likely in this failure mode
•Track ratio of methodology changes to implementation changes
•Reviewers should block if a phase completion has more doc changes than code changes

Detection Questions

When reviewing agent progress, ask:

•Did the reviewer discover anything not already stated? (Detects Status Echoing)
•Are there any XFAIL/skip markers on feature tests? (Detects XFAIL Abuse)
•Are user-requested features in a "later version" list? (Detects Infinite Deferral)
•Does grep -r "NotImplementedError\|todo!\|unimplemented!" find hits in "completed" code? (Detects False Completion)
•Is the agent editing process files more than source files? (Detects Administrative Displacement)