AgentSkillsCN

processing-invoices

从PDF发票中提取并验证结构化数据。适用于处理发票PDF、提取账单信息,或验证发票数据的场景。

SKILL.md
--- frontmatter
name: processing-invoices
description: Extracts and validates structured data from PDF invoices. Use when processing invoice PDFs, extracting billing information, or verifying invoice data.

Invoice Processing

Workflow

Copy this checklist and track progress:

code
Invoice Processing:
- [ ] Step 1: Log start time
- [ ] Step 2: Extract text from PDF
- [ ] Step 3: Parse invoice fields
- [ ] Step 4: Validate extracted data
- [ ] Step 5: Save to JSON AND eval log

Step 1: Log start time

Record the start time for eval tracking:

python
from datetime import datetime
start_time = datetime.now().isoformat()

Step 2: Extract text

python
from pypdf import PdfReader

reader = PdfReader("invoice.pdf")
text = ""
for page in reader.pages:
    page_text = page.extract_text()
    if page_text:
        text += page_text + "\n"

Step 3: Parse fields

Extract these from the text:

  • vendor: Company name at top of invoice
  • invoice_number: Look for "Invoice #", "INV-", "#"
  • date: Convert any format to YYYY-MM-DD
  • total: Amount after "Total:", "Amount Due:"

Step 4: Validate

Required fields MUST be present and valid:

  • vendor: non-empty string
  • invoice_number: non-empty string
  • date: valid YYYY-MM-DD format
  • total: positive number

If any field is missing, re-examine the PDF text.

Step 5: Save results

Save two files:

  1. Output file (requested by user):
json
{
  "vendor": "string",
  "invoice_number": "string",
  "date": "YYYY-MM-DD",
  "total": 0.00,
  "currency": "USD"
}
  1. Eval log (always append to eval_results/all_evals.jsonl):
bash
python scripts/collect_eval.py "<task_id>" "<original_task_prompt>" "<output_file>" "<notes>"

Example:

bash
python scripts/collect_eval.py "invoice-validate" "Extract and validate invoice data" "output.json" "validation passed"