Legal Document Ingestion
Automate the transfer of legal case documents from Google Drive or local sources into an Agent Zero VPS SQLite database.
Quick Start
- •Scan Google Drive for legal documents
- •Download files to sandbox
- •Upload to VPS ingestion directory
- •Run the ingestion script
- •Verify database statistics
For the complete step-by-step workflow, see workflow.md.
When to Use
- •Transferring legal case data from Google Drive to Agent Zero VPS
- •Processing DocketAlarm, LexisNexis, or PACER exports (Excel, CSV, PDF, DOCX)
- •Bootstrapping Agent Zero with existing legal intelligence
- •Setting up legal OSINT databases
Prerequisites
- •Agent Zero VPS with SSH access
- •Google Drive integration (optional, for Google Drive sources)
- •Python 3 with pandas and openpyxl on VPS
Core Workflow
Scan and Download
Use rclone to find legal documents in Google Drive:
bash
rclone lsf --config /home/ubuntu/.gdrive-rclone.ini "manus_google_drive:Legal Documents/" -R rclone copy "manus_google_drive:Legal Documents/" /home/ubuntu/gdrive_legal_docs/ --config /home/ubuntu/.gdrive-rclone.ini --include "*.xlsx"
Upload to VPS
Transfer files to the VPS ingestion directory:
bash
scp /home/ubuntu/gdrive_legal_docs/*.xlsx root@<VPS_IP>:/home/agentzero/data/ingestion/raw/docketalarm/
Run Ingestion
Execute the ingestion script on the VPS:
bash
ssh root@<VPS_IP> "cd /home/agentzero/data/ingestion/raw/docketalarm && for file in *.xlsx; do sudo -u agentzero /home/agentzero/venv/bin/python3 ../ingest_legal_documents.py \"\$file\" /home/agentzero/work/osint_engine/osint.db && mv \"\$file\" ../../processed/; done"
Verify Results
Check ingestion statistics:
bash
ssh root@<VPS_IP> "sqlite3 /home/agentzero/work/osint_engine/osint.db 'SELECT COUNT(*) FROM cases; SELECT COUNT(*) FROM parties;'"
Detailed Documentation
- •workflow.md - Complete 10-step workflow with troubleshooting
- •database_schema.md - SQLite schema for legal OSINT database
- •vps_setup.md - VPS directory structure and setup commands
- •file_formats.md - Expected file formats and data structures
Output
After successful ingestion:
- •Cases stored in SQLite database at
/home/agentzero/work/osint_engine/osint.db - •Processed files moved to
/home/agentzero/data/ingestion/processed/ - •Failed files (if any) in
/home/agentzero/data/ingestion/failed/ - •Ingestion logs in
/home/agentzero/data/ingestion/logs/
Bundled Resources
- •scripts/ingest_legal_documents.py - Python script for parsing and ingesting legal documents
- •references/ - Detailed documentation for workflow, database schema, VPS setup, and file formats