Code Validation
Validates code changes through automated scanning and LLM-guided heuristics to detect:
- •Test disabling patterns (skip, only, todo)
- •Secret exposure (hardcoded credentials, API keys)
- •Path portability issues (user-specific paths)
- •Dangerous security flags
- •Large deletions
- •Dependency/import changes
- •Broad exception handling
When to Use
Execute code-validation as part of QA validation protocol:
- •After Action Agent completes implementation
- •Before approving changes for merge
- •When reviewing diffs for red flags
- •During final validation phase
Validation Workflow
1. Automated Scanning (Scripts)
Run scripts first for deterministic, fast checks:
For Git Diffs (comparing branches):
# Compare feature branch against main python scripts/diff_analyzer.py --base main --format json # Compare specific commit range python scripts/diff_analyzer.py --range HEAD~5..HEAD --format json # Save report to file python scripts/diff_analyzer.py --base main --output validation-report.json
For Static File Analysis (staged changes or specific files):
# Scan specific files python scripts/static_analyzer.py src/app.py src/utils.py --format json # Scan entire directory python scripts/static_analyzer.py ./src --format json # Exclude patterns python scripts/static_analyzer.py ./src --exclude node_modules .git --format json # Save report python scripts/static_analyzer.py ./src --output scan-report.json
2. Interpret Scan Results
Parse JSON output and evaluate findings:
Finding Structure:
{
"category": "test_disabling|secret_exposure|path_portability|security_flags|large_deletion|dependency_change",
"severity": "CRITICAL|HIGH|MEDIUM|LOW",
"file": "path/to/file.py",
"line": 42,
"pattern": "regex pattern matched",
"context": "actual line content",
"message": "human-readable description"
}
Severity Guidelines:
- •CRITICAL: Secrets, user-specific paths in docs - BLOCK merge immediately
- •HIGH: Test disabling, security flags, user-specific paths in code - Require Action Agent fixes
- •MEDIUM: Large deletions, dependency changes, broad exceptions - Review and justify
- •LOW: Minor issues - Optional fixes
3. LLM Heuristic Review (Context-Dependent)
After automated scanning, apply human judgment for:
Test Assertion Weakening
Scripts cannot detect semantic changes. Manually review:
- •Reduced assertion count without clear reason
- •Replaced specific assertions with generic checks
- •Removed edge case validations
- •Changed from behavior validation to mock validation only
Red Flags:
// Before
expect(response.data).toMatchObject({
id: expect.any(String),
status: 'active',
count: expect.any(Number)
});
// After - WEAKENED
expect(response.data).toBeDefined(); // ❌ Lost specificity
Broad Try/Catch Appropriateness
Evaluate if exception handling is justified:
- •Top-level error boundaries: Often acceptable
- •Business logic: Usually inappropriate
- •Missing error logging/reporting: Red flag
- •Swallowing errors without assertions in tests: Red flag
When Acceptable:
// Top-level boundary
app.use((err, req, res, next) => {
logger.error(err);
res.status(500).json({ error: 'Internal error' });
});
Red Flag:
// Business logic swallowing errors
try {
await processPayment(data);
} catch (e) {
// ❌ Silent failure, no logging
}
Scope Creep vs Legitimate Refactoring
Assess if changes align with issue scope:
- •Issue describes feature X, but changes include unrelated Y
- •"While I was here" refactoring without issue reference
- •Architecture changes not mentioned in acceptance criteria
Legitimate:
- •Refactoring directly related to implementation
- •Fixing bugs discovered during implementation (document in scratch notes)
- •Updating tests to match new implementation
Scope Creep:
- •Reformatting unrelated files
- •Adding features not in issue
- •Changing patterns/conventions beyond issue scope
Architecture Alignment
Verify changes match current production architecture:
- •Check against ADRs referenced in
.project-context.md - •Verify stack matches documented tech stack
- •Confirm patterns follow project standards
- •Ensure deprecated approaches aren't reintroduced
4. Generate Validation Report
Combine automated findings with heuristic review:
## Code Validation Results for [ISSUE-ID] ### Automated Scan Summary - Files Changed: X - Total Findings: Y - CRITICAL: Z findings - HIGH: A findings - MEDIUM: B findings ### Critical Findings (BLOCK) [List CRITICAL severity findings with file:line references] ### High Priority Findings (FIX REQUIRED) [List HIGH severity findings] ### Heuristic Review - Test Assertion Quality: [PASS/FAIL with specifics] - Exception Handling: [PASS/WARN/FAIL with examples] - Scope Alignment: [PASS/WARN/FAIL with details] - Architecture Compliance: [PASS/FAIL with ADR references] ### Recommendation [APPROVED | CHANGES REQUIRED | BLOCKED] ### Action Items [Specific fixes needed with file:line references]
Script Output Format
Both scripts output JSON with this structure:
{
"commit_range": "main..HEAD",
"files_scanned": 42,
"files_changed": 15,
"total_findings": 8,
"findings_by_severity": {
"CRITICAL": 1,
"HIGH": 3,
"MEDIUM": 4,
"LOW": 0
},
"findings": [
{
"category": "secret_exposure",
"severity": "CRITICAL",
"file": "src/config.py",
"line": 12,
"pattern": "...",
"context": "API_KEY = 'sk_live_abc123...'",
"message": "Potential hardcoded secret"
}
],
"summary": {
"test_disabling": 2,
"secret_exposure": 1,
"path_portability": 3,
"security_flags": 1,
"dependency_changes": 1,
"large_deletions": 0
}
}
Red Flag Categories
Test Disabling (HIGH)
Patterns indicating tests were disabled rather than fixed:
- •
.skip(),.only(),.todo() - •
xit(),xdescribe(),fit(),fdescribe() - •
@pytest.skip,@unittest.skip
Action: Require Action Agent to fix tests or justify with comment
Secret Exposure (CRITICAL)
Hardcoded credentials or API keys:
- •API keys, tokens, passwords
- •AWS credentials
- •GitHub tokens
- •Stripe keys
Action: BLOCK merge, require environment variables
Path Portability (CRITICAL in docs, HIGH in code)
User-specific paths that won't work for other developers:
- •
/Users/username/ - •
/home/username/ - •
C:\Users\username\ - •
~/Desktop,~/Documents
Action: BLOCK if in documentation, require repo-relative paths
Security Flags (HIGH)
Commands that weaken security:
- •
--no-verify,--insecure,-k - •
chmod 777 - •
StrictHostKeyChecking no - •
--allow-root
Action: Require justification comment or removal
Large Deletions (MEDIUM)
Files with >100 lines removed:
- •May indicate legitimate refactoring
- •Could hide removed validation logic
- •Might remove important error handling
Action: Manual review to verify deletions are intentional
Dependency Changes (MEDIUM)
New imports or package additions:
- •Track new dependencies for security review
- •Verify necessity for issue scope
- •Check for unnecessary additions
Action: Verify in package.json/requirements.txt, run security audit
Integration with QA Protocol
Execute code-validation at Step 3: Change Review (Diff) in QA workflow:
- •Switch to feature branch
- •Run code-validation scripts
- •Interpret automated findings
- •Apply LLM heuristics
- •Continue with Claude Code Review (MCP)
- •Proceed with remaining QA steps
If CRITICAL or multiple HIGH findings:
- •BLOCK validation
- •Report to Traycer with specific file:line references
- •Delegate to Action Agent for fixes
- •Re-run validation after fixes
Resources
- •Scripts:
- •
scripts/diff_analyzer.py- Analyzes git diffs for red flags - •
scripts/static_analyzer.py- Scans files without git context
- •
Notes
- •Scripts are deterministic and fast; use them first
- •LLM heuristics handle context-dependent evaluation
- •Always provide file:line references in reports
- •CRITICAL findings must block merge
- •Document justified exceptions in code comments