AgentSkillsCN

processing-invoices

通过自动化验证,从PDF发票中提取并验证结构化数据。适用于处理发票PDF、提取账单信息,或在需要验证的场景中使用。

SKILL.md
--- frontmatter
name: processing-invoices
description: Extracts and validates structured data from PDF invoices with automated validation. Use when processing invoice PDFs, extracting billing information, or when validation is required.

Invoice Processing

Workflow

code
Invoice Processing:
- [ ] Step 1: Log start time
- [ ] Step 2: Extract PDF text
- [ ] Step 3: Parse invoice fields
- [ ] Step 4: Run validation: python scripts/validate_invoice.py output.json
- [ ] Step 5: Fix errors if validation fails
- [ ] Step 6: Save final output AND eval log

Step 1: Log start time

Record the start time for eval tracking:

python
from datetime import datetime
start_time = datetime.now().isoformat()

Step 2: Extract text

python
from pypdf import PdfReader

reader = PdfReader("invoice.pdf")
text = ""
for page in reader.pages:
    text += page.extract_text() + "\n"

Step 3: Parse fields

  • vendor: Company name (top of invoice)
  • invoice_number: Pattern like "Invoice #", "INV-"
  • date: Any format -> convert to YYYY-MM-DD
  • total: Final amount due (positive number)

Step 4: Validate

Run: python scripts/validate_invoice.py output.json

Step 5: Fix errors

If validation fails:

  1. Read error messages
  2. Fix the specific issues
  3. Run validation again
  4. Only proceed when it passes

Validation rules: See VALIDATION.md

Step 6: Save results

Save two files:

  1. Output file (requested by user):
json
{
  "vendor": "Company Name",
  "invoice_number": "INV-001",
  "date": "YYYY-MM-DD",
  "total": 1250.00,
  "currency": "USD"
}
  1. Eval log (always append to eval_results/all_evals.jsonl):
bash
python scripts/collect_eval.py "<task_id>" "<original_task_prompt>" "<output_file>" "<notes>"

Example:

bash
python scripts/collect_eval.py "invoice-auto-validate" "Extract and validate invoice with automated loop" "output.json" "validation passed after 1 attempt"