Bug Investigation Skill

When to Use

Use this skill when:

•Debugging extraction failures
•Investigating classification errors
•Analyzing search performance issues
•Troubleshooting database problems

Debugging Workflow

1. Reproduce the Issue

•Identify failing PDF or operation
•Reproduce in isolation
•Collect error messages

2. Check Extraction Metrics

Use metrics.py to check:

•Extraction success rate
•Methods used (pdfplumber vs OCR)
•Error patterns

3. Review Logs

•Error messages in console
•Database error logs
•Processing statistics

Common Bug Patterns

PDF Extraction Failures

Symptoms:

•"Sem texto extraível"
•Empty content in database
•OCR not triggered when needed

Investigation:

•Check if PDF is scanned (images)
•Verify OCR is installed and working
•Test extraction manually
•Check file permissions

Classification Errors

Symptoms:

•Documents classified as "outros"
•Incorrect contract number extraction
•Missing document numbers

Investigation:

•Check filename pattern
•Test regex patterns
•Verify classification logic
•Review expected vs actual output

Database Issues

Symptoms:

•Duplicate key errors
•FTS5 index not updating
•Missing data in results

Investigation:

•Check filepath uniqueness
•Verify triggers are working
•Test queries directly
•Check database schema

Search Problems

Symptoms:

•No results found
•Incorrect results
•Performance issues

Investigation:

•Verify FTS5 index exists
•Test query syntax
•Check content was indexed
•Review filter logic

Logging and Error Handling

Error Logging Pattern

python

try:
    text = extract_text_from_pdf(full_path)
except Exception as e:
    print(f"   ❌ Erro ao processar {file}: {e}")
    # Log error with context
    errors += 1

Debugging Checklist

• Error message clear and helpful
• Context information logged
• Error doesn't crash entire process
• Metrics track failures

Test Verification Steps

For Extraction Bugs

•Test with sample PDF
•Verify extraction method used
•Check text length
•Validate OCR if used

For Classification Bugs

•Test classification function directly
•Verify regex matches
•Check fallback logic
•Compare with expected result

Bug Fix Examples

Fix: OCR Not Triggering

Root Cause: OCR check happens after text validation

Fix: Move OCR check before validation failure

Fix: Classification Fails

Root Cause: Regex doesn't match all patterns

Fix: Improve regex or add alternative patterns