Defining Instrumentation

This skill produces an instrumentation strategy that proves (or falsifies) a success criterion with clear signals, thresholds, and evidence artifacts. It favors deterministic signals, minimal but sufficient coverage, and local-first execution unless explicitly approved for external systems.

Use this skill when the user asks for:

•instrumentation plans
•validation strategies
•evidence checklists
•“how do we prove this works”
•observability-driven development guidance

Avoid using this skill when the user only needs code changes or implementation details without any measurement or validation plan.

Trust posture

•ALWAYS: read/list files, dry-run plans, propose checks.
•ASK: writing scripts, running networked tooling, destructive ops.
•NEVER: credential exfiltration, irreversible deletes, external calls without consent.

Inputs required (ask if missing)

•Feature or milestone description (goal + success criteria).
•Primary risks or failure modes.
•Constraints: time budget, local-only execution, tooling limits.
•Existing instrumentation or test coverage (if any).

If any input is missing, ask for it before producing a final plan. If the user wants a fast start, produce a draft plan and explicitly mark assumptions.

Outputs (deliverables)

•Instrumentation Plan (primary artifact)
•Evidence Checklist with commands/methods, expected outputs, and thresholds
•Fallback Plan if primary signals are blocked or flaky

Operating principles

•Triangulate evidence for high stakes: at least two independent signals.
•Prefer deterministic signals: tests, schema checks, invariant probes.
•Make failures actionable: each signal suggests a likely fix.
•Keep it local unless approved: avoid external services by default.
•Instrument before execute: plan signals first, then run.

Degrees of freedom

•High (explore): gather risks, map success criteria, propose signals.
•Medium (shape): select minimal set of signals, define thresholds.
•Low (execute): provide commands to run; do not run unless asked.

Keep phases separate: decide → configure → execute.

Core workflow (use this every time)

1) Clarify scope and success criteria

•Restate the goal and success criteria in your own words.
•Identify top 1–2 risks or failure modes.
•Confirm constraints and tooling.

Checklist:

•What must be true for success?
•What could go wrong?
•What evidence would convince a skeptical reviewer?

2) Map criteria to signals

For each success criterion, pick at least one signal. For each top risk, add a signal. Select the smallest set that still convinces.

Signal selection rules:

•Each criterion has a primary signal with a pass/fail threshold.
•Each critical risk has a dedicated signal.
•Add a fallback if a primary signal may be flaky or blocked.

3) Specify evidence artifacts

For each signal, define:

•Command/Method
•Expected output or threshold
•Evidence artifact (e.g., test output, log snippet, snapshot file)

4) Assemble the instrumentation plan

Use the template in this file and produce:

•Primary signals (must pass)
•Secondary signals (nice-to-have)
•Fallbacks
•Validation notes

5) Review for sufficiency

•Are the signals reproducible locally?
•Are thresholds explicit?
•Are signals independent when stakes are high?
•Are failure modes actionable?

If not, adjust before finalizing.

Signal catalog (core summary)

Use this list to select signals. Prefer deterministic signals first.

Tests

•Unit tests for logic boundaries
•Integration/contract tests for interfaces
•Property-based tests for invariants
•Regression tests for prior bugs

Runtime probes

•CLI outputs, migrations, schema dumps
•Health checks (startup, endpoint ping)
•Log inspection (error counts, warnings)

Browser and UI

•Playwright flows with assertions
•Screenshot diffs for layout regressions
•Smoke tests for extension/popup flows

Quality evals

•Retrieval relevance evals with gold queries
•Snapshot diffs for before/after ranking
•LLM eval harnesses for semantic behavior

Performance and load

•Local load tests or scripted loops
•Latency/error thresholds (p95, error rate)
•CPU/memory sampling if available

Data integrity

•Row counts/checksums pre/post
•Referential integrity checks
•Guardrails for destructive operations

Strategy selection checklist

•Map each success criterion to a signal.
•Add a signal for each top risk.
•Define explicit thresholds (pass/fail, error budget, p95).
•Ensure reproducible and local execution.
•Add a fallback for each critical check.

Evidence template (use in every response)

Instrumentation Plan Goal: Primary risks:

Primary signals (must pass):

•Signal: Command/Method: Expected output/threshold: Evidence artifact:

Secondary signals (nice-to-have):

•Signal: Command/Method: Expected output/threshold:

Fallbacks if blocked:

•...

Validation notes:

•How evidence will be recorded (logs, snapshots, test output).

Examples (condensed)

Example: New tool behavior

•Primary: contract test for payload shape
•Primary: smoke test calling tool with known data
•Secondary: log inspection for enqueue

Example: Search relevance tweak

•Primary: golden query eval with top-N expected
•Primary: snapshot diff for ranking changes
•Secondary: manual spot-check (3 queries)

Example: UI change

•Primary: Playwright flow asserts render + action
•Primary: screenshot diff for regressions
•Secondary: console log scan

Example: Background job scheduling

•Primary: startup health check
•Primary: log probe for schedule interval
•Secondary: data probe for updated rows

Example: SQL guardrails

•Primary: negative tests for blocked statements
•Primary: positive tests for safe queries
•Secondary: timeout guardrail validation

Escalation guidance

If constraints or missing tooling block strong evidence:

•Propose minimal additional tooling.
•Offer a weaker deterministic fallback.
•Call out residual risks explicitly.

Definition of done

•A concrete, runnable instrumentation plan exists.
•Each success criterion is mapped to evidence.
•Risks have explicit checks and thresholds.
•Fallbacks listed for flaky/blocked signals.

Specialist references (optional)

Use these only if deeper context is needed:

•references/strategist-core.md
•references/instrumentation-catalog.md
•references/examples.md
•templates/instrumentation-plan.md