Test Generation Skill

When to Use

Use this skill when:

•Creating unit tests for functions (indexer, search, organize)
•Writing integration tests for full workflows
•Testing PDF extraction (pdfplumber and OCR)
•Testing database operations (SQLite/FTS5)
•Validating document classification logic

Testing Framework

Currently no test framework configured, but recommended approach:

•Framework: pytest (recommended for Python projects)
•Test Location: tests/ directory
•Naming: test_*.py files
•Coverage: Aim for >80% coverage

Test Organization

code

tests/
├── test_indexer.py      # Tests for indexer.py
├── test_search.py       # Tests for search.py
├── test_organize.py     # Tests for organize.py
├── test_extraction.py   # Tests for PDF extraction
└── fixtures/            # Test fixtures (sample PDFs)

Key Functions to Test

From `indexer.py`:

•create_db() - Database creation and schema
•extract_text_from_pdf() - PDF text extraction
•extract_text_from_pdf_ocr() - OCR fallback
•classify_document() - Document classification logic
•extract_metadata() - Metadata extraction
•index_files() - Full indexing workflow

From `search.py`:

•search() - Full-text search with filters
•list_categories() - Category listing
•list_contracts() - Contract listing

From `organize.py`:

•organize_documents() - Document organization
•generate_summary_report() - Report generation

Testing Patterns

Unit Test Example

python

import pytest
from contrato.indexer import classify_document

def test_classify_document_contrato():
    """Test classification of main contract."""
    filename = "TNE_JU_COM_0002-13 - SIEMENS - Contrato.pdf"
    result = classify_document(filename, "/path/to/file")
    
    assert result["category"] == "contrato"
    assert result["document_type"] == "contrato_principal"
    assert result["contract_number"] == "TNE_JU_COM_0002-13"

def test_classify_document_aditivo():
    """Test classification of addendums."""
    filename = "TNE_JU_COM_0002-13 - SIEMENS - 1ºAditivo.pdf"
    result = classify_document(filename, "/path/to/file")
    
    assert result["category"] == "aditivo"
    assert result["document_number"] == "1"

def test_classify_document_anexo():
    """Test classification of attachments."""
    filename = "TNE_JU_COM_0002-13 (Anexo I).pdf"
    result = classify_document(filename, "/path/to/file")
    
    assert result["category"] == "anexo"
    assert result["document_number"] == "I"

Integration Test Example

python

def test_full_indexing_workflow(tmp_path):
    """Test complete indexing workflow."""
    # Setup: Create test PDF in tmp_path
    # Execute: Run index_files()
    # Assert: Verify documents in database
    # Assert: Verify FTS index created

Mocking Strategies

For external dependencies:

•PDFs: Use sample PDF files in fixtures/
•OCR: Mock pytesseract when testing extraction
•Database: Use in-memory SQLite for tests
•File System: Use tmp_path from pytest

Test Coverage Goals

•
Critical Functions: 100% coverage
- •classify_document() - Core classification logic
- •extract_text_from_pdf() - Extraction methods
- •Database operations
•
Overall Coverage: >80%
•
Edge Cases: Test error handling, empty files, corrupted PDFs

Test Data

Create test fixtures:

•Sample PDFs (with text)
•Sample scanned PDFs (for OCR testing)
•Sample contract filenames
•Database fixtures with known data

Best Practices

•Isolation: Each test should be independent
•Fixtures: Reuse test PDFs and data
•Clear Names: Test names should describe what they test
•Portuguese: Test descriptions can be in Portuguese
•Assertions: Use descriptive assertion messages

Common Test Scenarios

PDF Extraction Tests

•Extract text from normal PDFs
•Fallback to OCR for scanned PDFs
•Handle corrupted PDFs gracefully
•Extract metadata correctly

Classification Tests

•Classify contracts correctly
•Extract contract numbers
•Identify addendums by number
•Identify attachments by number

Database Tests

•Verify schema creation
•Test FTS5 index updates
•Validate triggers work
•Test query performance

Error Handling Tests

•Missing PDF files
•Corrupted PDFs
•Database connection errors
•OCR failures

Checklist

When writing tests:

• Test file created in tests/ directory
• Test functions named test_*
• Edge cases covered
• Error cases tested
• Fixtures created for reusable data
• Assertions are clear and descriptive
• Integration tests for workflows
• Mock external dependencies appropriately

Test Generation Skill

When to Use

Testing Framework

Test Organization

Key Functions to Test

From indexer.py:

From search.py:

From organize.py:

Testing Patterns

Unit Test Example

Integration Test Example

Mocking Strategies

Test Coverage Goals

Test Data

Best Practices

Common Test Scenarios

PDF Extraction Tests

Classification Tests

Database Tests

Error Handling Tests

Checklist

From `indexer.py`:

From `search.py`:

From `organize.py`: