Bug Investigation Skill
When to Use
Use this skill when:
- •Debugging extraction failures
- •Investigating classification errors
- •Analyzing search performance issues
- •Troubleshooting database problems
Debugging Workflow
1. Reproduce the Issue
- •Identify failing PDF or operation
- •Reproduce in isolation
- •Collect error messages
2. Check Extraction Metrics
Use metrics.py to check:
- •Extraction success rate
- •Methods used (pdfplumber vs OCR)
- •Error patterns
3. Review Logs
- •Error messages in console
- •Database error logs
- •Processing statistics
Common Bug Patterns
PDF Extraction Failures
Symptoms:
- •"Sem texto extraível"
- •Empty content in database
- •OCR not triggered when needed
Investigation:
- •Check if PDF is scanned (images)
- •Verify OCR is installed and working
- •Test extraction manually
- •Check file permissions
Classification Errors
Symptoms:
- •Documents classified as "outros"
- •Incorrect contract number extraction
- •Missing document numbers
Investigation:
- •Check filename pattern
- •Test regex patterns
- •Verify classification logic
- •Review expected vs actual output
Database Issues
Symptoms:
- •Duplicate key errors
- •FTS5 index not updating
- •Missing data in results
Investigation:
- •Check filepath uniqueness
- •Verify triggers are working
- •Test queries directly
- •Check database schema
Search Problems
Symptoms:
- •No results found
- •Incorrect results
- •Performance issues
Investigation:
- •Verify FTS5 index exists
- •Test query syntax
- •Check content was indexed
- •Review filter logic
Logging and Error Handling
Error Logging Pattern
python
try:
text = extract_text_from_pdf(full_path)
except Exception as e:
print(f" ❌ Erro ao processar {file}: {e}")
# Log error with context
errors += 1
Debugging Checklist
- • Error message clear and helpful
- • Context information logged
- • Error doesn't crash entire process
- • Metrics track failures
Test Verification Steps
For Extraction Bugs
- •Test with sample PDF
- •Verify extraction method used
- •Check text length
- •Validate OCR if used
For Classification Bugs
- •Test classification function directly
- •Verify regex matches
- •Check fallback logic
- •Compare with expected result
Bug Fix Examples
Fix: OCR Not Triggering
Root Cause: OCR check happens after text validation
Fix: Move OCR check before validation failure
Fix: Classification Fails
Root Cause: Regex doesn't match all patterns
Fix: Improve regex or add alternative patterns