Consistency Checker
Automatically detect and fix consistency issues across your codebase to maintain code quality and prevent common errors.
Overview
This skill provides automated tools to:
- •Scan the codebase for consistency issues
- •Report detailed findings with severity levels
- •Fix issues automatically where safe
- •Validate pipeline data flows
Quick Start
1. Check for Issues
Run the consistency checker on your project:
python scripts/check_consistency.py /path/to/project
This generates a detailed report and saves it as consistency_report.json.
2. Review the Report
The report categorizes issues by:
- •🔴 High severity: Security or breaking issues (e.g., .env not in gitignore)
- •🟡 Medium severity: Quality issues (e.g., inconsistent naming)
- •🟢 Low severity: Nice-to-have improvements
3. Apply Fixes
Fix issues automatically:
# Interactive mode (prompts for confirmation) python scripts/fix_consistency.py /path/to/project # Auto mode (applies all safe fixes) python scripts/fix_consistency.py /path/to/project --auto
4. Check Pipeline Data Flow
For projects with data pipelines:
python scripts/check_pipeline.py /path/to/project
What Gets Checked
Requirements Files
- •Issue: Multiple
requirements.txtfiles across directories - •Check: Scans for all
*requirements*.txtand*requirement.txtfiles - •Fix: Consolidates into single root-level
requirements.txt - •Backups: Saves originals to
.consistency_backup/
Environment Configuration
- •Issue:
.envfiles not properly managed - •Checks:
- •
.envis in.gitignore - •
.env.exampleexists as template - •Environment variables are documented
- •
- •Fix:
- •Adds
.envto.gitignore - •Creates
.env.examplefrom existing.env
- •Adds
Git Configuration
- •Issue: Missing essential patterns in
.gitignore - •Checks: Presence of:
- •
.env(security) - •
__pycache__/(Python cache) - •
*.pyc(compiled Python) - •
node_modules/(if using Node.js)
- •
- •Fix: Creates or updates
.gitignorewith essential patterns
Naming Conventions
- •Issue: Inconsistent file/folder naming
- •Checks:
- •Python files use
snake_case - •Directories use lowercase with underscores
- •No spaces in names
- •Python files use
- •Reports: Files that don't follow conventions (doesn't auto-fix names to prevent breaking imports)
Import Statements
- •Issue: Broken or problematic imports
- •Checks:
- •Relative imports that might break
- •Missing
__init__.pyfiles - •Import statement patterns
- •Reports: Files with potential import issues for manual review
Pipeline Data Flow
- •Issue: Output of one stage doesn't match input of next
- •Checks:
- •Type hints on functions
- •Return types vs parameter types
- •Data structure compatibility
- •Reports: Mismatches with suggestions for fixes
See references/common-issues.md for detailed solutions.
Workflow
Standard Audit Workflow
- •
Initial Scan
bashpython scripts/check_consistency.py /path/to/project
Review the generated
consistency_report.jsonto understand issues. - •
Critical Fixes First Handle high-severity issues:
- •Ensure
.envis in.gitignore - •Consolidate requirements files
- •Fix broken imports
- •Ensure
- •
Automated Fixes
bashpython scripts/fix_consistency.py /path/to/project --auto
- •
Manual Review Review any issues that require manual intervention:
- •Naming convention violations
- •Import structure refactoring
- •Pipeline type mismatches
- •
Validation Re-run the checker to confirm issues are resolved:
bashpython scripts/check_consistency.py /path/to/project
Pipeline-Specific Workflow
For projects with data processing pipelines:
- •
Identify Stages The checker automatically detects files matching:
- •
**/pipeline*.py - •
**/stage*.py - •
**/*_stage.py - •
**/process*.py
- •
- •
Analyze Data Flow
bashpython scripts/check_pipeline.py /path/to/project
- •
Review Mismatches Check the report for:
- •Type incompatibilities between stages
- •Suggested transformations
- •Missing type hints
- •
Fix Type Issues Add explicit type hints:
pythondef stage1() -> pd.DataFrame: """Stage 1 output.""" return data def stage2(input_data: pd.DataFrame) -> Dict: """Stage 2 expects DataFrame input.""" return processed - •
Add Adapters (if needed) Create adapter functions for type conversions:
pythondef adapt_stage1_to_stage2(output: List[Dict]) -> pd.DataFrame: """Convert stage1 output to stage2 input format.""" return pd.DataFrame(output)
Advanced Usage
Custom Checks
Extend the checker for project-specific rules:
from scripts.check_consistency import ConsistencyChecker
class MyChecker(ConsistencyChecker):
def check_custom_rule(self):
"""Add your custom consistency check."""
# Your logic here
if issue_found:
self.issues["custom"].append({
"severity": "medium",
"message": "Custom issue found",
"files": ["path/to/file"]
})
checker = MyChecker("/path/to/project")
report = checker.check_all()
Filtering Reports
Focus on specific categories:
import json
with open('consistency_report.json') as f:
report = json.load(f)
# Show only high-severity issues
high_severity = [
issue for issues in report['issues'].values()
for issue in issues
if issue.get('severity') == 'high'
]
CI/CD Integration
Add to your CI pipeline:
# .github/workflows/consistency-check.yml
name: Consistency Check
on: [push, pull_request]
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Check Consistency
run: |
python scripts/check_consistency.py .
if grep -q '"severity": "high"' consistency_report.json; then
echo "High severity issues found!"
exit 1
fi
Output Files
All scripts generate reports saved in the project root:
- •
consistency_report.json: Full detailed report from checker - •
pipeline_report.json: Pipeline-specific analysis - •
.consistency_backup/: Backup of files before automated changes
Best Practices
- •Run Before Commits: Check consistency before committing
- •Review Auto-Fixes: Always review changes made in
--automode - •Keep Backups: The tool creates backups, but verify critical files
- •Iterative Fixing: Fix high-severity issues first, then medium, then low
- •Document Decisions: If you intentionally violate a rule, document why
Limitations
- •Naming: Won't auto-rename files (risks breaking imports)
- •Imports: Can't fix all import issues automatically (complex dependencies)
- •Pipelines: Requires type hints for accurate analysis
- •README: Only checks existence, not content quality
For manual intervention guidance, see references/common-issues.md.
Troubleshooting
"No pipeline stages detected"
- •Ensure files follow naming patterns:
pipeline*.py,*_stage.py, etc. - •Pipeline files should contain processing functions
"Import analysis failed"
- •Check for syntax errors in Python files
- •Ensure files are valid Python (
.pyextension)
"Permission denied" during fixes
- •Run with appropriate permissions
- •Check file/directory ownership
False positives
- •Review the report context
- •Add exceptions in custom checker if needed
Reference Materials
- •common-issues.md: Detailed guide for each issue type with multiple solution approaches