AgentSkillsCN

processing-invoices

通过自动化验证与错误校正循环,从PDF发票中提取、验证并结构化数据。适用于处理发票PDF、提取账单数据、批量处理发票,或在对准确性要求极高的场景中使用。

SKILL.md
--- frontmatter
name: processing-invoices
description: Extracts, validates, and structures data from PDF invoices with automated validation and error correction loops. Use when processing invoice PDFs, extracting billing data, batch processing invoices, or when accuracy is critical.

Invoice Processing

Workflow

code
Invoice Processing:
- [ ] Step 1: Log start time
- [ ] Step 2: Extract PDF text
- [ ] Step 3: Parse invoice fields
- [ ] Step 4: Validate (run validate_invoice.py)
- [ ] Step 5: Fix errors and re-validate if needed
- [ ] Step 6: Save final output AND eval log

Step 1: Log start time

Record the start time for eval tracking:

python
from datetime import datetime
start_time = datetime.now().isoformat()

Step 2: Extract text

python
from pypdf import PdfReader

reader = PdfReader("invoice.pdf")
text = ""
for page in reader.pages:
    page_text = page.extract_text()
    if page_text:
        text += page_text + "\n"

Step 3: Parse fields

Extract from text:

  • vendor: Company name (usually at document top, in larger font)
  • invoice_number: Look for "Invoice #", "Invoice No.", "INV-", "#"
  • date: Invoice/billing date -> convert to YYYY-MM-DD
  • total: Final amount ("Total:", "Amount Due:", "Balance:")
  • currency: Default USD if not specified

Step 4: Validate

Run: python scripts/validate_invoice.py output.json

Step 5: Fix and re-validate

If validation fails:

  1. Read the specific error message
  2. Re-examine PDF text for that field
  3. Update JSON with corrected value
  4. Run validation again
  5. Repeat until validation passes

Common issues: See TROUBLESHOOTING.md

Step 6: Save results

Save two files:

  1. Output file (requested by user):
json
{
  "vendor": "...",
  "invoice_number": "...",
  "date": "YYYY-MM-DD",
  "total": 0.00
}
  1. Eval log (always append to eval_results/all_evals.jsonl):
bash
python scripts/collect_eval.py "<task_id>" "<original_task_prompt>" "<output_file>" "<notes>"

Example:

bash
python scripts/collect_eval.py "invoice-basic" "Extract invoice data from invoice.pdf" "output.json" "validation passed on first attempt"

Always append to eval_results/all_evals.jsonl (one JSON per line) if it exists.

Output format

json
{
  "vendor": "Company Name",
  "invoice_number": "INV-2025-001",
  "date": "2025-01-15",
  "total": 1250.00,
  "currency": "USD",
  "line_items": []
}

Validation rules

See VALIDATION.md for complete rules.