Rubric Evaluation Skill
Evaluate a Go project against the Invisible coding test rubric (4 categories, 400 total points).
Trigger Conditions
- •User runs
/rubric-evaluate(full evaluation) - •User runs
/rubric-code-quality,/rubric-functionality,/rubric-testing, or/rubric-documentation(single category) - •User runs
/rubric-quick-score(automated-only quick score) - •User asks to "evaluate", "score", or "check rubric"
Input Contract
- •Required: Access to the project root directory
- •Optional: Specific category to evaluate (defaults to all 4)
- •Optional: INSTRUCTIONS.md for requirements tracking
- •Optional: Previous scorecard for comparison
Output Contract
- •Structured scorecard with all 13 sub-criteria scored
- •Per-sub-criterion evidence (specific files, line numbers, examples)
- •Total score out of 400 with percentage
- •Prioritized improvement list ordered by (points gained / effort)
- •Comparison against previous score if available
Tool Permissions
- •Read: All source files, test files, README.md, INSTRUCTIONS.md, coverage reports, lint output
- •Execute:
go test -coverprofile,golangci-lint run,gocyclo,go vet,wc,grep,find - •Write: Scorecard output to stdout or
.cursor/evaluations/ - •Search: File patterns, directory structures, code patterns
Execution Steps
Step 0: Setup
- •Identify the project root (look for
go.mod) - •Check for
INSTRUCTIONS.md-- if present, parse requirements into a checklist - •Check for existing coverage reports (
coverage.out) - •Check for existing lint configuration (
.golangci.yml)
Step 1: Code Quality Assessment (100 points)
1a. Organization (25 points)
Run these checks:
bash
# Check for layered architecture ls -d cmd/ internal/ 2>/dev/null ls -d internal/handler/ internal/service/ internal/repository/ internal/models/ 2>/dev/null # Count distinct packages find . -name "*.go" -not -path "./vendor/*" | xargs grep -l "^package " | sort -u | wc -l
Scoring:
- •25:
cmd/,internal/with handler, service, repository, models, middleware, config subdirectories - •20: Layered but missing 1-2 expected directories
- •15: Some separation but mixed concerns (e.g., handlers and services in same package)
- •10: Minimal organization, most code in 2-3 files
- •5: Single file or flat structure
1b. Naming (25 points)
Run these checks:
bash
# Check for unexported names starting with uppercase (naming convention violation) golangci-lint run --enable=revive --disable-all ./... 2>&1 | grep -c "naming" # Check for abbreviations in exported names grep -rn "func [A-Z]" --include="*.go" | grep -E "(Mgr|Svc|Repo|Cfg|Ctx|Req|Resp|Msg|Err|Val|Num|Str|Int|Buf|Arr|Lst)" | wc -l # Check for descriptive test names grep -rn "func Test" --include="*_test.go" | head -20
Scoring:
- •25: All names descriptive, consistent Go conventions, no abbreviations, test names follow
TestX_Y_Zpattern - •20: 1-2 unclear or abbreviated names
- •15: Several naming inconsistencies
- •10: Many abbreviations or unclear names
- •5: Naming is poor throughout
1c. Readability (25 points)
Run these checks:
bash
# Cyclomatic complexity gocyclo -over 10 . 2>/dev/null | wc -l gocyclo -over 15 . 2>/dev/null | wc -l # Function length (lines > 40) # Count functions and their line lengths grep -rn "^func " --include="*.go" | wc -l
Scoring:
- •25: All functions < 40 lines, no cyclomatic complexity > 10, clear control flow
- •20: 1-2 functions exceed limits, but well-commented
- •15: Several long/complex functions, moderate comments
- •10: Many long functions, insufficient comments
- •5: Code is difficult to follow
1d. Best Practices (25 points)
Run these checks:
bash
# Full lint check golangci-lint run ./... 2>&1 | tail -1 # Count issues golangci-lint run ./... 2>&1 | grep -c "^.*\.go:" # Check for float64 with money grep -rn "float64" --include="*.go" | grep -i "amount\|balance\|price\|money\|total" | wc -l # Check for proper error wrapping grep -rn "fmt.Errorf" --include="*.go" | grep -c "%w"
Scoring:
- •25: 0 lint issues, idiomatic Go, no anti-patterns, decimal for money, proper error wrapping
- •20: 1-5 lint issues, minor style inconsistencies
- •15: 6-15 lint issues, some anti-patterns
- •10: 16-30 lint issues, multiple anti-patterns
- •5: 30+ lint issues or critical anti-patterns (float64 for money)
Step 2: Functionality Assessment (100 points)
2a. Requirements Completion (40 points)
- •Parse
INSTRUCTIONS.mdfor explicit requirements - •For each requirement, search the codebase for implementation evidence
- •Score = (implemented_count / total_requirements) * 40
bash
# Count API endpoints defined grep -rn "router\.\(GET\|POST\|PUT\|DELETE\|PATCH\)" --include="*.go" | wc -l # Or for Gin grep -rn "\.GET\|\.POST\|\.PUT\|\.DELETE\|\.PATCH" --include="*.go" | grep -v "_test.go" | wc -l
2b. Edge Cases (30 points)
Check for these patterns:
bash
# Input validation patterns grep -rn "validate\|Validate\|binding:" --include="*.go" | grep -v "_test.go" | wc -l # Error return patterns grep -rn "return.*err\|return.*Error\|return.*error" --include="*.go" | grep -v "_test.go" | wc -l # Nil checks grep -rn "== nil\|!= nil" --include="*.go" | grep -v "_test.go" | wc -l # Idempotency key handling grep -rn -i "idempoten" --include="*.go" | wc -l # Negative amount checks grep -rn "LessThan\|GreaterThan\|IsNegative\|IsZero\|Sign()" --include="*.go" | wc -l
Scoring:
- •30: Validation on all endpoints, custom error types, boundary checks, idempotency, negative amount guards
- •20: Most inputs validated, main error paths handled, some boundary checks
- •10: Minimal validation, basic error handling
- •0: No validation or error handling
2c. Performance (30 points)
Check for:
bash
# Database indexing grep -rn "Index\|index\|INDEX\|uniqueIndex" --include="*.go" | grep -v "_test.go" | wc -l # Pagination grep -rn -i "limit\|offset\|page\|per_page\|cursor" --include="*.go" | grep -v "_test.go" | wc -l # Connection pooling grep -rn "SetMaxOpenConns\|SetMaxIdleConns\|pool" --include="*.go" | wc -l # N+1 potential (loops with DB calls) grep -rn "for.*range" --include="*.go" -A5 | grep -c "Find\|First\|Where\|Query" # SELECT * usage (should be avoided) grep -rn 'SELECT \*' --include="*.go" | wc -l
Scoring:
- •30: Indexed columns, pagination, connection pooling, no N+1 patterns, no SELECT *
- •20: Most performance concerns addressed, 1-2 minor issues
- •10: Some attention to performance
- •0: Obvious performance problems
Step 3: Testing Assessment (100 points)
3a. Coverage (40 points)
Run:
bash
# Generate coverage go test -coverprofile=coverage.out ./... 2>&1 # Extract percentage go tool cover -func=coverage.out | tail -1 # Per-package coverage go tool cover -func=coverage.out | grep "total:"
Scoring:
- •40: 80%+ total coverage
- •30: 60-79% coverage
- •20: 40-59% coverage
- •10: 20-39% coverage
- •5: < 20% coverage
3b. Quality (30 points)
Check:
bash
# Table-driven tests
grep -rn "tests := \[\]struct\|testCases := \[\]struct\|tt := \[\]struct" --include="*_test.go" | wc -l
# Subtests
grep -rn "t\.Run(" --include="*_test.go" | wc -l
# Negative test cases
grep -rn "Error\|Fail\|Invalid\|NotFound\|BadRequest\|Unauthorized" --include="*_test.go" | wc -l
# Assertion count
grep -rn "assert\.\|require\.\|if.*!=" --include="*_test.go" | wc -l
# Test helper functions
grep -rn "func.*testing\.T\|func.*testing\.B\|t\.Helper()" --include="*_test.go" | wc -l
Scoring:
- •30: Table-driven tests throughout, meaningful assertions, negative + edge cases, subtests, helpers
- •20: Some table-driven tests, good assertions, main paths tested
- •10: Basic assertions, mostly happy path
- •0: Trivial or no meaningful tests
3c. Organization (30 points)
Check:
bash
# Test file count find . -name "*_test.go" -not -path "./vendor/*" | wc -l # Test directories find . -type d -name "tests" -o -name "testdata" -o -name "fixtures" | wc -l # Test helper files find . -name "*helper*" -o -name "*fixture*" -o -name "*factory*" | grep -v vendor | wc -l # Integration vs unit test separation find . -name "*_integration_test.go" -o -name "*_e2e_test.go" | wc -l
Scoring:
- •30: Separate test dirs, helper/fixture packages, unit + integration + e2e, clear naming convention
- •20: Tests alongside code, some helpers, good naming
- •10: Tests exist but disorganized, no helpers
- •0: No test organization at all
Step 4: Documentation Assessment (100 points)
4a. README Quality (40 points)
Check:
bash
# README exists test -f README.md && echo "exists" || echo "missing" # Required sections for section in "Description" "Prerequisites" "Setup" "Install" "Build" "Run" "Test" "API" "Usage" "Example"; do grep -qi "$section" README.md 2>/dev/null && echo "FOUND: $section" || echo "MISSING: $section" done # Section count grep -c "^##" README.md 2>/dev/null || echo 0 # Word count wc -w README.md 2>/dev/null
Scoring:
- •40: 8+ sections, clear setup/build/run/test instructions, API documentation, examples, 500+ words
- •30: 5-7 sections, most instructions present, some examples
- •20: 3-4 sections, basic instructions
- •10: Minimal README (just title and description)
- •0: No README
4b. Code Documentation (30 points)
Check:
bash
# Count exported functions grep -rn "^func [A-Z]" --include="*.go" | grep -v "_test.go" | wc -l # Count documented exported functions (comment on line before func) grep -rn -B1 "^func [A-Z]" --include="*.go" | grep -v "_test.go" | grep -c "^.*\.go.*\/\/" # Exported types grep -rn "^type [A-Z]" --include="*.go" | grep -v "_test.go" | wc -l # Documented exported types grep -rn -B1 "^type [A-Z]" --include="*.go" | grep -v "_test.go" | grep -c "^.*\.go.*\/\/"
Scoring:
- •30: 90%+ exported items documented with godoc, complex logic commented, API docs generated
- •20: 60-89% documented
- •10: 30-59% documented
- •0: < 30% documented
4c. Design Decisions (30 points)
Check:
bash
# ADR files find . -name "ADR*" -o -name "adr*" -o -name "DECISIONS*" | wc -l # Design section in README grep -qi "design\|architecture\|decisions\|trade.off\|limitations" README.md 2>/dev/null && echo "found" || echo "missing" # Inline design comments grep -rn "// Design:\|// Architecture:\|// Trade-off:\|// Why:" --include="*.go" | wc -l
Scoring:
- •30: Dedicated design decisions section/doc, architecture explanation, tech choice rationale, trade-offs, limitations, "what I'd improve" section
- •20: Some design explanation in README or comments
- •10: Minimal design rationale
- •0: No design documentation
Step 5: Produce Scorecard
Compile all scores into the output format:
markdown
## Rubric Scorecard -- [Date] ### Code Quality: XX/100 | Sub-Criteria | Score | Evidence | |---|---|---| | Organization | XX/25 | [directories found] | | Naming | XX/25 | [naming issues found] | | Readability | XX/25 | [complexity metrics] | | Best Practices | XX/25 | [lint issue count] | ### Functionality: XX/100 | Sub-Criteria | Score | Evidence | |---|---|---| | Requirements | XX/40 | [X/Y implemented] | | Edge Cases | XX/30 | [validation patterns found] | | Performance | XX/30 | [performance patterns found] | ### Testing: XX/100 | Sub-Criteria | Score | Evidence | |---|---|---| | Coverage | XX/40 | [XX% coverage] | | Quality | XX/30 | [table-driven count, assertion count] | | Organization | XX/30 | [test file count, helper count] | ### Documentation: XX/100 | Sub-Criteria | Score | Evidence | |---|---|---| | README Quality | XX/40 | [section count, word count] | | Code Docs | XX/30 | [XX% documented] | | Design Decisions | XX/30 | [design docs found] | --- ### TOTAL: XXX/400 (XX%) ### Grade - 360-400: Exceptional (90-100%) - 320-359: Strong (80-89%) - 280-319: Good (70-79%) - 240-279: Adequate (60-69%) - 200-239: Needs Improvement (50-59%) - < 200: Below Expectations (< 50%) ### Top Priority Improvements | # | Action | Estimated Points | Effort | Category | |---|---|---|---|---| | 1 | [action] | +XX | Low/Med/High | [category] | | 2 | [action] | +XX | Low/Med/High | [category] | | 3 | [action] | +XX | Low/Med/High | [category] |
Success Criteria
- •All 13 sub-criteria scored with numeric value and evidence
- •Total score calculated correctly
- •At least 3 prioritized improvements identified
- •Improvements ordered by points-per-effort ratio
- •If INSTRUCTIONS.md exists, requirements tracked individually
Escalation Rules
- •Score < 200/400: Escalate urgently -- recommend focusing on highest-value items
- •Any 40-point sub-criterion scoring 0: Critical flag
- •Test suite fails to compile: Block further evaluation, fix tests first
- •No README: Immediate action item (40 easy points at stake)