AgentSkillsCN

legal-document-ingestion

从Google Drive或本地来源,将法律案件文档(Excel、CSV、PDF、DOCX)导入Agent Zero VPS数据库。适用于将DocketAlarm、LexisNexis或PACER的法律案件数据迁移至远程VPS,搭建法律OSINT数据库,或利用现有法律情报为Agent Zero进行初始配置的场景。

SKILL.md
--- frontmatter
name: legal-document-ingestion
description: Ingest legal case documents (Excel, CSV, PDF, DOCX) from Google Drive or local sources into an Agent Zero VPS database. Use when transferring legal case data from DocketAlarm, LexisNexis, or PACER to a remote VPS, setting up legal OSINT databases, or bootstrapping Agent Zero with existing legal intelligence.

Legal Document Ingestion

Automate the transfer of legal case documents from Google Drive or local sources into an Agent Zero VPS SQLite database.

Quick Start

  1. Scan Google Drive for legal documents
  2. Download files to sandbox
  3. Upload to VPS ingestion directory
  4. Run the ingestion script
  5. Verify database statistics

For the complete step-by-step workflow, see workflow.md.

When to Use

  • Transferring legal case data from Google Drive to Agent Zero VPS
  • Processing DocketAlarm, LexisNexis, or PACER exports (Excel, CSV, PDF, DOCX)
  • Bootstrapping Agent Zero with existing legal intelligence
  • Setting up legal OSINT databases

Prerequisites

  • Agent Zero VPS with SSH access
  • Google Drive integration (optional, for Google Drive sources)
  • Python 3 with pandas and openpyxl on VPS

Core Workflow

Scan and Download

Use rclone to find legal documents in Google Drive:

bash
rclone lsf --config /home/ubuntu/.gdrive-rclone.ini "manus_google_drive:Legal Documents/" -R
rclone copy "manus_google_drive:Legal Documents/" /home/ubuntu/gdrive_legal_docs/ --config /home/ubuntu/.gdrive-rclone.ini --include "*.xlsx"

Upload to VPS

Transfer files to the VPS ingestion directory:

bash
scp /home/ubuntu/gdrive_legal_docs/*.xlsx root@<VPS_IP>:/home/agentzero/data/ingestion/raw/docketalarm/

Run Ingestion

Execute the ingestion script on the VPS:

bash
ssh root@<VPS_IP> "cd /home/agentzero/data/ingestion/raw/docketalarm && for file in *.xlsx; do sudo -u agentzero /home/agentzero/venv/bin/python3 ../ingest_legal_documents.py \"\$file\" /home/agentzero/work/osint_engine/osint.db && mv \"\$file\" ../../processed/; done"

Verify Results

Check ingestion statistics:

bash
ssh root@<VPS_IP> "sqlite3 /home/agentzero/work/osint_engine/osint.db 'SELECT COUNT(*) FROM cases; SELECT COUNT(*) FROM parties;'"

Detailed Documentation

Output

After successful ingestion:

  • Cases stored in SQLite database at /home/agentzero/work/osint_engine/osint.db
  • Processed files moved to /home/agentzero/data/ingestion/processed/
  • Failed files (if any) in /home/agentzero/data/ingestion/failed/
  • Ingestion logs in /home/agentzero/data/ingestion/logs/

Bundled Resources

  • scripts/ingest_legal_documents.py - Python script for parsing and ingesting legal documents
  • references/ - Detailed documentation for workflow, database schema, VPS setup, and file formats