Rubric Evaluation Skill

Evaluate a Go project against the Invisible coding test rubric (4 categories, 400 total points).

Trigger Conditions

•User runs /rubric-evaluate (full evaluation)
•User runs /rubric-code-quality, /rubric-functionality, /rubric-testing, or /rubric-documentation (single category)
•User runs /rubric-quick-score (automated-only quick score)
•User asks to "evaluate", "score", or "check rubric"

Input Contract

•Required: Access to the project root directory
•Optional: Specific category to evaluate (defaults to all 4)
•Optional: INSTRUCTIONS.md for requirements tracking
•Optional: Previous scorecard for comparison

Output Contract

•Structured scorecard with all 13 sub-criteria scored
•Per-sub-criterion evidence (specific files, line numbers, examples)
•Total score out of 400 with percentage
•Prioritized improvement list ordered by (points gained / effort)
•Comparison against previous score if available

Tool Permissions

•Read: All source files, test files, README.md, INSTRUCTIONS.md, coverage reports, lint output
•Execute: go test -coverprofile, golangci-lint run, gocyclo, go vet, wc, grep, find
•Write: Scorecard output to stdout or .cursor/evaluations/
•Search: File patterns, directory structures, code patterns

Execution Steps

Step 0: Setup

•Identify the project root (look for go.mod)
•Check for INSTRUCTIONS.md -- if present, parse requirements into a checklist
•Check for existing coverage reports (coverage.out)
•Check for existing lint configuration (.golangci.yml)

Step 1: Code Quality Assessment (100 points)

1a. Organization (25 points)

Run these checks:

bash

# Check for layered architecture
ls -d cmd/ internal/ 2>/dev/null
ls -d internal/handler/ internal/service/ internal/repository/ internal/models/ 2>/dev/null
# Count distinct packages
find . -name "*.go" -not -path "./vendor/*" | xargs grep -l "^package " | sort -u | wc -l

Scoring:

•25: cmd/, internal/ with handler, service, repository, models, middleware, config subdirectories
•20: Layered but missing 1-2 expected directories
•15: Some separation but mixed concerns (e.g., handlers and services in same package)
•10: Minimal organization, most code in 2-3 files
•5: Single file or flat structure

1b. Naming (25 points)

Run these checks:

bash

# Check for unexported names starting with uppercase (naming convention violation)
golangci-lint run --enable=revive --disable-all ./... 2>&1 | grep -c "naming"
# Check for abbreviations in exported names
grep -rn "func [A-Z]" --include="*.go" | grep -E "(Mgr|Svc|Repo|Cfg|Ctx|Req|Resp|Msg|Err|Val|Num|Str|Int|Buf|Arr|Lst)" | wc -l
# Check for descriptive test names
grep -rn "func Test" --include="*_test.go" | head -20

Scoring:

•25: All names descriptive, consistent Go conventions, no abbreviations, test names follow TestX_Y_Z pattern
•20: 1-2 unclear or abbreviated names
•15: Several naming inconsistencies
•10: Many abbreviations or unclear names
•5: Naming is poor throughout

1c. Readability (25 points)

Run these checks:

bash

# Cyclomatic complexity
gocyclo -over 10 . 2>/dev/null | wc -l
gocyclo -over 15 . 2>/dev/null | wc -l
# Function length (lines > 40)
# Count functions and their line lengths
grep -rn "^func " --include="*.go" | wc -l

Scoring:

•25: All functions < 40 lines, no cyclomatic complexity > 10, clear control flow
•20: 1-2 functions exceed limits, but well-commented
•15: Several long/complex functions, moderate comments
•10: Many long functions, insufficient comments
•5: Code is difficult to follow

1d. Best Practices (25 points)

Run these checks:

bash

# Full lint check
golangci-lint run ./... 2>&1 | tail -1
# Count issues
golangci-lint run ./... 2>&1 | grep -c "^.*\.go:"
# Check for float64 with money
grep -rn "float64" --include="*.go" | grep -i "amount\|balance\|price\|money\|total" | wc -l
# Check for proper error wrapping
grep -rn "fmt.Errorf" --include="*.go" | grep -c "%w"

Scoring:

•25: 0 lint issues, idiomatic Go, no anti-patterns, decimal for money, proper error wrapping
•20: 1-5 lint issues, minor style inconsistencies
•15: 6-15 lint issues, some anti-patterns
•10: 16-30 lint issues, multiple anti-patterns
•5: 30+ lint issues or critical anti-patterns (float64 for money)

Step 2: Functionality Assessment (100 points)

2a. Requirements Completion (40 points)

•Parse INSTRUCTIONS.md for explicit requirements
•For each requirement, search the codebase for implementation evidence
•Score = (implemented_count / total_requirements) * 40

bash

# Count API endpoints defined
grep -rn "router\.\(GET\|POST\|PUT\|DELETE\|PATCH\)" --include="*.go" | wc -l
# Or for Gin
grep -rn "\.GET\|\.POST\|\.PUT\|\.DELETE\|\.PATCH" --include="*.go" | grep -v "_test.go" | wc -l

2b. Edge Cases (30 points)

Check for these patterns:

bash

# Input validation patterns
grep -rn "validate\|Validate\|binding:" --include="*.go" | grep -v "_test.go" | wc -l
# Error return patterns
grep -rn "return.*err\|return.*Error\|return.*error" --include="*.go" | grep -v "_test.go" | wc -l
# Nil checks
grep -rn "== nil\|!= nil" --include="*.go" | grep -v "_test.go" | wc -l
# Idempotency key handling
grep -rn -i "idempoten" --include="*.go" | wc -l
# Negative amount checks
grep -rn "LessThan\|GreaterThan\|IsNegative\|IsZero\|Sign()" --include="*.go" | wc -l

Scoring:

•30: Validation on all endpoints, custom error types, boundary checks, idempotency, negative amount guards
•20: Most inputs validated, main error paths handled, some boundary checks
•10: Minimal validation, basic error handling
•0: No validation or error handling

2c. Performance (30 points)

Check for:

bash

# Database indexing
grep -rn "Index\|index\|INDEX\|uniqueIndex" --include="*.go" | grep -v "_test.go" | wc -l
# Pagination
grep -rn -i "limit\|offset\|page\|per_page\|cursor" --include="*.go" | grep -v "_test.go" | wc -l
# Connection pooling
grep -rn "SetMaxOpenConns\|SetMaxIdleConns\|pool" --include="*.go" | wc -l
# N+1 potential (loops with DB calls)
grep -rn "for.*range" --include="*.go" -A5 | grep -c "Find\|First\|Where\|Query"
# SELECT * usage (should be avoided)
grep -rn 'SELECT \*' --include="*.go" | wc -l

Scoring:

•30: Indexed columns, pagination, connection pooling, no N+1 patterns, no SELECT *
•20: Most performance concerns addressed, 1-2 minor issues
•10: Some attention to performance
•0: Obvious performance problems

Step 3: Testing Assessment (100 points)

3a. Coverage (40 points)

Run:

bash

# Generate coverage
go test -coverprofile=coverage.out ./... 2>&1
# Extract percentage
go tool cover -func=coverage.out | tail -1
# Per-package coverage
go tool cover -func=coverage.out | grep "total:"

Scoring:

•40: 80%+ total coverage
•30: 60-79% coverage
•20: 40-59% coverage
•10: 20-39% coverage
•5: < 20% coverage

3b. Quality (30 points)

Check:

bash

# Table-driven tests
grep -rn "tests := \[\]struct\|testCases := \[\]struct\|tt := \[\]struct" --include="*_test.go" | wc -l
# Subtests
grep -rn "t\.Run(" --include="*_test.go" | wc -l
# Negative test cases
grep -rn "Error\|Fail\|Invalid\|NotFound\|BadRequest\|Unauthorized" --include="*_test.go" | wc -l
# Assertion count
grep -rn "assert\.\|require\.\|if.*!=" --include="*_test.go" | wc -l
# Test helper functions
grep -rn "func.*testing\.T\|func.*testing\.B\|t\.Helper()" --include="*_test.go" | wc -l

Scoring:

•30: Table-driven tests throughout, meaningful assertions, negative + edge cases, subtests, helpers
•20: Some table-driven tests, good assertions, main paths tested
•10: Basic assertions, mostly happy path
•0: Trivial or no meaningful tests

3c. Organization (30 points)

Check:

bash

# Test file count
find . -name "*_test.go" -not -path "./vendor/*" | wc -l
# Test directories
find . -type d -name "tests" -o -name "testdata" -o -name "fixtures" | wc -l
# Test helper files
find . -name "*helper*" -o -name "*fixture*" -o -name "*factory*" | grep -v vendor | wc -l
# Integration vs unit test separation
find . -name "*_integration_test.go" -o -name "*_e2e_test.go" | wc -l

Scoring:

•30: Separate test dirs, helper/fixture packages, unit + integration + e2e, clear naming convention
•20: Tests alongside code, some helpers, good naming
•10: Tests exist but disorganized, no helpers
•0: No test organization at all

Step 4: Documentation Assessment (100 points)

4a. README Quality (40 points)

Check:

bash

# README exists
test -f README.md && echo "exists" || echo "missing"
# Required sections
for section in "Description" "Prerequisites" "Setup" "Install" "Build" "Run" "Test" "API" "Usage" "Example"; do
  grep -qi "$section" README.md 2>/dev/null && echo "FOUND: $section" || echo "MISSING: $section"
done
# Section count
grep -c "^##" README.md 2>/dev/null || echo 0
# Word count
wc -w README.md 2>/dev/null

Scoring:

•40: 8+ sections, clear setup/build/run/test instructions, API documentation, examples, 500+ words
•30: 5-7 sections, most instructions present, some examples
•20: 3-4 sections, basic instructions
•10: Minimal README (just title and description)
•0: No README

4b. Code Documentation (30 points)

Check:

bash

# Count exported functions
grep -rn "^func [A-Z]" --include="*.go" | grep -v "_test.go" | wc -l
# Count documented exported functions (comment on line before func)
grep -rn -B1 "^func [A-Z]" --include="*.go" | grep -v "_test.go" | grep -c "^.*\.go.*\/\/"
# Exported types
grep -rn "^type [A-Z]" --include="*.go" | grep -v "_test.go" | wc -l
# Documented exported types
grep -rn -B1 "^type [A-Z]" --include="*.go" | grep -v "_test.go" | grep -c "^.*\.go.*\/\/"

Scoring:

•30: 90%+ exported items documented with godoc, complex logic commented, API docs generated
•20: 60-89% documented
•10: 30-59% documented
•0: < 30% documented

4c. Design Decisions (30 points)

Check:

bash

# ADR files
find . -name "ADR*" -o -name "adr*" -o -name "DECISIONS*" | wc -l
# Design section in README
grep -qi "design\|architecture\|decisions\|trade.off\|limitations" README.md 2>/dev/null && echo "found" || echo "missing"
# Inline design comments
grep -rn "// Design:\|// Architecture:\|// Trade-off:\|// Why:" --include="*.go" | wc -l

Scoring:

•30: Dedicated design decisions section/doc, architecture explanation, tech choice rationale, trade-offs, limitations, "what I'd improve" section
•20: Some design explanation in README or comments
•10: Minimal design rationale
•0: No design documentation

Step 5: Produce Scorecard

Compile all scores into the output format:

markdown

## Rubric Scorecard -- [Date]

### Code Quality: XX/100
| Sub-Criteria | Score | Evidence |
|---|---|---|
| Organization | XX/25 | [directories found] |
| Naming | XX/25 | [naming issues found] |
| Readability | XX/25 | [complexity metrics] |
| Best Practices | XX/25 | [lint issue count] |

### Functionality: XX/100
| Sub-Criteria | Score | Evidence |
|---|---|---|
| Requirements | XX/40 | [X/Y implemented] |
| Edge Cases | XX/30 | [validation patterns found] |
| Performance | XX/30 | [performance patterns found] |

### Testing: XX/100
| Sub-Criteria | Score | Evidence |
|---|---|---|
| Coverage | XX/40 | [XX% coverage] |
| Quality | XX/30 | [table-driven count, assertion count] |
| Organization | XX/30 | [test file count, helper count] |

### Documentation: XX/100
| Sub-Criteria | Score | Evidence |
|---|---|---|
| README Quality | XX/40 | [section count, word count] |
| Code Docs | XX/30 | [XX% documented] |
| Design Decisions | XX/30 | [design docs found] |

---

### TOTAL: XXX/400 (XX%)

### Grade
- 360-400: Exceptional (90-100%)
- 320-359: Strong (80-89%)
- 280-319: Good (70-79%)
- 240-279: Adequate (60-69%)
- 200-239: Needs Improvement (50-59%)
- < 200: Below Expectations (< 50%)

### Top Priority Improvements
| # | Action | Estimated Points | Effort | Category |
|---|---|---|---|---|
| 1 | [action] | +XX | Low/Med/High | [category] |
| 2 | [action] | +XX | Low/Med/High | [category] |
| 3 | [action] | +XX | Low/Med/High | [category] |

Success Criteria

•All 13 sub-criteria scored with numeric value and evidence
•Total score calculated correctly
•At least 3 prioritized improvements identified
•Improvements ordered by points-per-effort ratio
•If INSTRUCTIONS.md exists, requirements tracked individually

Escalation Rules

•Score < 200/400: Escalate urgently -- recommend focusing on highest-value items
•Any 40-point sub-criterion scoring 0: Critical flag
•Test suite fails to compile: Block further evaluation, fix tests first
•No README: Immediate action item (40 easy points at stake)