Session Feedback
Evaluate AI session interaction logs and generate actionable feedback to help users improve their AI-assisted development workflow.
Overview
This skill reads the session reports and log files produced by session-interaction-logger and produces a structured feedback report. The report highlights what the user did well, identifies missed opportunities, and provides concrete suggestions for future sessions.
Read Session Logs → Analyze Patterns → Evaluate Against Rubric → Generate Feedback Report
When to Use This Skill
- •After completing a coding session to get retrospective feedback
- •When asked to "review session", "evaluate logs", or "give feedback"
- •Before starting a new session to review lessons from the last one
- •When preparing for an AI-usage evaluation or audit
Input Sources
The skill reads from the following files (produced by session-interaction-logger):
| Source | Location | What It Provides |
|---|---|---|
| Session reports | logs/copilot/session-report*.md | High-level session summaries, timelines, decisions |
| Interactions log | logs/copilot/interactions.log | Full prompt/response detail (JSONL) |
| File changes log | logs/copilot/file-changes.log | What was created/modified and why (JSONL) |
| Commands log | logs/copilot/commands.log | Terminal commands executed (JSONL) |
| Decisions log | logs/copilot/decisions.log | Architectural decisions with rationale (JSONL) |
If structured JSONL logs are not available, the skill falls back to analyzing the Markdown session reports.
Output
A feedback report written to:
logs/feedback/report-{YYYY-MM-DD}.md
Evaluation Rubric
The feedback report evaluates the session across these dimensions:
1. Prompt Quality (Weight: 25%)
| Rating | Criteria |
|---|---|
| Excellent | Prompts are specific, provide context, reference files/docs, state desired outcome |
| Good | Prompts are clear but could include more context or constraints |
| Needs Work | Prompts are vague, ambiguous, or require multiple clarifications |
What to look for:
- •Did the user reference design docs, files, or specific requirements?
- •Were prompts self-contained or did they require many follow-up clarifications?
- •Did the user specify constraints (language version, patterns, testing expectations)?
2. Planning & Specification (Weight: 20%)
| Rating | Criteria |
|---|---|
| Excellent | User requested a plan before implementation; reviewed and approved it |
| Good | Some planning occurred but implementation started without full review |
| Needs Work | No planning phase; jumped straight to implementation |
What to look for:
- •Was a plan/spec document created before coding?
- •Did the user review the plan and provide feedback?
- •Were steps broken down into manageable pieces?
3. Iterative Refinement (Weight: 20%)
| Rating | Criteria |
|---|---|
| Excellent | User reviewed outputs, caught issues, requested fixes, iterated on quality |
| Good | Some iteration occurred but outputs were mostly accepted as-is |
| Needs Work | Outputs blindly accepted without review or testing |
What to look for:
- •Did the user request changes or improvements after initial output?
- •Were tests run and failures addressed?
- •Did the user verify behavior matches expectations?
4. Decision Documentation (Weight: 15%)
| Rating | Criteria |
|---|---|
| Excellent | Key decisions documented with rationale and alternatives considered |
| Good | Some decisions noted but missing rationale or alternatives |
| Needs Work | No decision documentation; choices made without explanation |
What to look for:
- •Were architectural choices explained?
- •Were alternatives considered and trade-offs discussed?
- •Can someone reading the logs understand why decisions were made?
5. Testing & Verification (Weight: 20%)
| Rating | Criteria |
|---|---|
| Excellent | Tests written alongside implementation; failures caught and fixed; coverage considered |
| Good | Tests added but not comprehensive; some verification performed |
| Needs Work | No tests or verification of AI-generated code |
What to look for:
- •Were tests requested as part of implementation?
- •Were build/test commands run to verify correctness?
- •Were test failures analyzed and resolved?
Step-by-Step Workflow
Step 1: Gather Session Logs
- •List all files under
logs/copilot/ - •Identify session report files (
session-report*.md) - •Check for structured JSONL logs (
interactions.log,decisions.log, etc.) - •If a specific date is requested, filter to that session; otherwise analyze the most recent
Step 2: Analyze Session Content
Read each log source and extract:
- •Prompts: User requests — count, specificity, context provided
- •Iterations: How many rounds of refinement occurred per task
- •Decisions: Architectural choices, rationale, alternatives
- •File changes: Volume, organization, whether tests accompanied code
- •Commands: Build/test execution, success/failure patterns
- •Clarifications: How often the AI needed to ask for more info
Step 3: Score Against Rubric
For each rubric dimension:
- •Review the relevant evidence from the logs
- •Assign a rating: Excellent, Good, or Needs Work
- •Provide specific examples from the session to justify the rating
- •Calculate a weighted overall score
Step 4: Generate Suggestions
Based on the evaluation, produce:
- •Strengths: 2-4 things the user did well (with examples)
- •Improvement Areas: 2-4 areas to focus on (with examples)
- •Actionable Tips: 3-5 concrete things to try in the next session
- •Prompt Templates: Example improved prompts based on patterns observed
Step 5: Write Feedback Report
Generate the report at logs/feedback/report-{YYYY-MM-DD}.md using the structure below.
Report Template
The generated feedback report follows this structure:
# Session Feedback Report
**Date**: {date}
**Session(s) Analyzed**: {session IDs or report file names}
**Generated By**: session-feedback skill
---
## Overall Score: {X}/100
| Dimension | Weight | Rating | Score |
|-----------|--------|--------|-------|
| Prompt Quality | 25% | {rating} | {score}/25 |
| Planning & Specification | 20% | {rating} | {score}/20 |
| Iterative Refinement | 20% | {rating} | {score}/20 |
| Decision Documentation | 15% | {rating} | {score}/15 |
| Testing & Verification | 20% | {rating} | {score}/20 |
---
## Strengths
{Bulleted list with specific examples from the session}
## Areas for Improvement
{Bulleted list with specific examples and why they matter}
## Actionable Suggestions for Next Session
{Numbered list of concrete things to try}
## Prompt Improvement Examples
{Before/after prompt examples based on patterns observed}
---
_Generated on {timestamp} by session-feedback skill_
_Source logs: logs/copilot/_
Script Usage
Generate a feedback report from the command line:
.github/skills/session-feedback/scripts/generate-feedback-report.sh [YYYY-MM-DD]
- •If a date is provided, it scans for
session-report-{date}.md - •If no date is provided, it analyzes all available session reports
- •Output is always written to
logs/feedback/report-{date}.md
Best Practices
| Practice | Why |
|---|---|
| Run feedback after every session | Builds a habit of reflection and continuous improvement |
| Review suggestions before the next session | Primes you to apply improvements |
| Compare reports over time | Track your growth in AI collaboration skills |
| Share feedback reports with your team | Helps establish team-wide best practices |
References
- •Evaluation rubric details:
references/evaluation-rubric.md - •Session interaction logger skill:
../.github/skills/session-interaction-logger/SKILL.md