Code Quality Grader Skill

Bu skill, Anthropic'in "Demystifying Evals for AI Agents" makalesindeki grader konseptini basitleştirilmiş halde uygular.

Grader Türleri

1. Code-Based Graders (Deterministic)

Lint Check

bash

# TypeScript/JavaScript
npm run lint 2>&1 | tail -20

# Python
ruff check . 2>&1 | tail -20
# veya
pylint src/ --score=y 2>&1 | tail -10

Scoring:

•0 error = ✅ PASS
•1-5 errors = ⚠️ WARNING
•5+ errors = ❌ FAIL

Type Check

bash

# TypeScript
npx tsc --noEmit 2>&1 | tail -20

# Python
mypy src/ 2>&1 | tail -20

Scoring:

•0 errors = ✅ PASS
•Any error = ❌ FAIL

Test Check

bash

# Jest
npm test 2>&1 | grep -E "(Tests|Passed|Failed)"

# Pytest
pytest --tb=no -q 2>&1 | tail -5

Scoring:

•All pass = ✅ PASS
•Any fail = ❌ FAIL

Security Check

bash

# npm audit
npm audit --audit-level=moderate 2>&1 | tail -10

# Python safety
safety check 2>&1 | tail -10

# Secret detection
grep -rn "password\s*=\s*['\"][^'\"]*['\"]" src/ 2>/dev/null | head -5
grep -rn "api[_-]?key\s*=\s*['\"][^'\"]*['\"]" src/ 2>/dev/null | head -5

Scoring:

•0 vulnerabilities = ✅ PASS
•Moderate = ⚠️ WARNING
•High/Critical = ❌ FAIL

2. Model-Based Graders (LLM-as-Judge)

Code Readability

code

Bu kodu değerlendir (1-5 puan):

KRITERLER:
1. Okunabilirlik - Değişken/fonksiyon isimleri açıklayıcı mı?
2. Yapı - Single responsibility, DRY prensipleri uygulanmış mı?
3. Yorum - Karmaşık logic açıklanmış mı?
4. Error handling - Edge case'ler düşünülmüş mü?
5. Test edilebilirlik - Mock'lanabilir, izole edilebilir mi?

SCORING:
5 = Mükemmel
4 = İyi
3 = Kabul edilebilir
2 = İyileştirme gerekli
1 = Ciddi sorunlar var

API Design

code

Bu API endpoint'i değerlendir:

KRITERLER:
1. RESTful conventions uygun mu?
2. Request/response yapısı tutarlı mı?
3. Error response'lar standart mı?
4. Versioning düşünülmüş mü?
5. Documentation yeterli mi?

Grading Report Formatı

markdown

# Code Quality Report

**Tarih:** YYYY-MM-DD HH:MM
**Scope:** [dosya/modül/PR]

## Deterministic Checks

| Check | Status | Details |
|-------|--------|---------|
| Lint | ✅ PASS | 0 errors |
| Types | ⚠️ WARN | 2 warnings |
| Tests | ✅ PASS | 45/45 passed |
| Security | ❌ FAIL | 1 high vulnerability |

## Model-Based Assessment

| Criteria | Score | Notes |
|----------|-------|-------|
| Readability | 4/5 | İyi, bazı fonksiyon isimleri kısaltılmış |
| Structure | 5/5 | SOLID prensipleri uygulanmış |
| Error Handling | 3/5 | Bazı edge case'ler eksik |

## Overall Score: 78/100

## Critical Issues (Must Fix)
1. [Security] npm audit: lodash vulnerability
2. [Tests] Missing edge case tests for auth module

## Recommendations (Should Fix)
1. Rename `fn` to `formatName`
2. Add try-catch to async operations

## Positive Highlights
- Clean separation of concerns
- Good test coverage (85%)

Hızlı Kullanım

Full Check (Tüm grader'lar)

code

/grade full

Quick Check (Sadece deterministic)

code

/grade quick

Specific Check

code

/grade lint
/grade tests
/grade security
/grade readability

Integration with qa-engineer

qa-engineer agent bu skill'i şu durumlarda kullanmalı:

•PR Review öncesi: Quick check
•Feature tamamlandığında: Full check
•Release öncesi: Full check + security focus

Scoring Thresholds

Overall Score	Status	Action
90-100	✅ Excellent	Ready to merge
80-89	✅ Good	Minor fixes optional
70-79	⚠️ Acceptable	Should fix recommendations
60-69	⚠️ Needs Work	Must fix critical issues
<60	❌ Poor	Major refactoring needed

Otomatik Tetikleme

Bu skill aşağıdaki durumlarda otomatik çağrılabilir:

•git commit öncesi (pre-commit hook)
•PR oluşturulduğunda
•Feature tamamlandığında (passes: true yapılmadan önce)