Supabase Database Architecture for Arlis
Overview
Arlis uses Supabase (PostgreSQL) to store user applications, documents, and agent execution history. This skill documents the schema design, security policies, and best practices.
Core Tables
1. applications - Main Application Tracking
Purpose: Tracks government applications (SNAP, Medicaid, passport, etc.) that Arlis is helping users complete.
CREATE TABLE applications (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
title TEXT NOT NULL,
document_type TEXT, -- 'snap', 'medicaid', 'visa', 'passport', 'tax_filing'
portal_url TEXT,
status TEXT DEFAULT 'draft', -- draft, in_progress, submitted, pending, approved, denied
urgency TEXT DEFAULT 'medium', -- low, medium, high, critical
deadline TIMESTAMPTZ,
extracted_data JSONB NOT NULL DEFAULT '{}',
next_action TEXT,
next_check_at TIMESTAMPTZ,
confirmation_number TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
Key Fields:
- •
extracted_data: JSONB storing form field values (e.g.,{"full_name": "John Doe", "ssn": "***-**-1234"}) - •
next_check_at: When Arlis should check the portal again (for marathon agents) - •
status: Lifecycle state of the application
Indexes:
CREATE INDEX idx_applications_user_id ON applications(user_id); CREATE INDEX idx_applications_status ON applications(status); CREATE INDEX idx_applications_next_check ON applications(next_check_at) WHERE next_check_at IS NOT NULL;
2. documents - Uploaded Files
Purpose: Stores user-uploaded documents (photos of letters, PDFs, IDs) with extracted text/fields.
CREATE TABLE documents ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), application_id UUID REFERENCES applications(id) ON DELETE CASCADE, type TEXT, -- 'income_proof', 'id_card', 'letter', 'notice', 'utility_bill' original_url TEXT NOT NULL, -- Supabase Storage URL extracted_text TEXT, extracted_fields JSONB, confidence_score FLOAT, created_at TIMESTAMPTZ DEFAULT NOW() );
Key Fields:
- •
original_url: Points to Supabase Storage bucket (e.g.,documents/user123/file.pdf) - •
extracted_text: Full text from Gemini Vision OCR - •
extracted_fields: Structured data (e.g.,{"bill_date": "2024-12-15", "amount": "$45.32"}) - •
confidence_score: Gemini Vision confidence (0.0 - 1.0)
3. agent_tasks - Execution History
Purpose: Logs every action Arlis takes (for transparency, debugging, and user trust).
CREATE TABLE agent_tasks ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), application_id UUID REFERENCES applications(id) ON DELETE CASCADE, tool_name TEXT NOT NULL, tool_args JSONB, tool_result JSONB, screenshot_url TEXT, -- Screenshot of browser state success BOOLEAN DEFAULT true, error_message TEXT, execution_time_ms INTEGER, created_at TIMESTAMPTZ DEFAULT NOW() );
Key Fields:
- •
tool_name: e.g.,"browserbase_navigate","fill_form_field","upload_document" - •
screenshot_url: Browserbase screenshot for visual debugging - •
execution_time_ms: Performance tracking
Index:
CREATE INDEX idx_agent_tasks_application_id ON agent_tasks(application_id);
Row Level Security (RLS)
Critical: All tables MUST have RLS enabled to prevent users from accessing each other's data.
Applications Table
ALTER TABLE applications ENABLE ROW LEVEL SECURITY; -- Users can only view their own applications CREATE POLICY "Users can view own applications" ON applications FOR SELECT USING (auth.uid()::text = user_id); -- Users can only insert applications for themselves CREATE POLICY "Users can insert own applications" ON applications FOR INSERT WITH CHECK (auth.uid()::text = user_id); -- Users can only update their own applications CREATE POLICY "Users can update own applications" ON applications FOR UPDATE USING (auth.uid()::text = user_id);
Documents Table
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own documents"
ON documents FOR SELECT
USING (
EXISTS (
SELECT 1 FROM applications
WHERE applications.id = documents.application_id
AND applications.user_id = auth.uid()::text
)
);
Agent Tasks Table
ALTER TABLE agent_tasks ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Users can view own agent tasks"
ON agent_tasks FOR SELECT
USING (
EXISTS (
SELECT 1 FROM applications
WHERE applications.id = agent_tasks.application_id
AND applications.user_id = auth.uid()::text
)
);
Future Enhancements
1. pgvector for Semantic Search
Use Case: Search documents by meaning (e.g., "find my income proof from December")
-- Enable pgvector extension CREATE EXTENSION IF NOT EXISTS vector; -- Add embedding column to documents ALTER TABLE documents ADD COLUMN embedding vector(1536); -- Create index for similarity search CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);
Usage:
// Generate embedding with Gemini
const embedding = await generateEmbedding(query);
// Search similar documents
const { data } = await supabase
.rpc('match_documents', {
query_embedding: embedding,
match_threshold: 0.8,
match_count: 5
});
2. Full-Text Search
Use Case: Search extracted text from documents
-- Add tsvector column
ALTER TABLE documents ADD COLUMN search_vector tsvector;
-- Create trigger to auto-update search vector
CREATE FUNCTION documents_search_trigger() RETURNS trigger AS $$
BEGIN
NEW.search_vector := to_tsvector('english', COALESCE(NEW.extracted_text, ''));
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER documents_search_update
BEFORE INSERT OR UPDATE ON documents
FOR EACH ROW EXECUTE FUNCTION documents_search_trigger();
-- Create GIN index
CREATE INDEX documents_search_idx ON documents USING GIN(search_vector);
Best Practices
1. Always Use Transactions for Multi-Step Operations
const { data, error } = await supabase.rpc('create_application_with_docs', {
application_data: { title: "Passport Renewal", ... },
document_urls: ["url1", "url2"]
});
2. Use JSONB for Flexible Data
- •Store form fields in
extracted_dataas JSONB - •Allows querying:
WHERE extracted_data->>'ssn' IS NOT NULL - •Supports indexing:
CREATE INDEX ON applications USING GIN(extracted_data);
3. Cascade Deletes
- •Documents and agent_tasks use
ON DELETE CASCADE - •When an application is deleted, all related data is automatically removed
4. Timestamps
- •Always include
created_atandupdated_at - •Use
TIMESTAMPTZ(timezone-aware) notTIMESTAMP
5. Indexes
- •Index foreign keys (
application_id) - •Index frequently queried fields (
user_id,status) - •Use partial indexes for sparse data (
WHERE next_check_at IS NOT NULL)
Common Queries
Get All Applications for a User
const { data } = await supabase
.from('applications')
.select('*')
.eq('user_id', userId)
.order('created_at', { ascending: false });
Get Application with Documents and Tasks
const { data } = await supabase
.from('applications')
.select(`
*,
documents (*),
agent_tasks (*)
`)
.eq('id', applicationId)
.single();
Find Applications Needing Attention
const { data } = await supabase
.from('applications')
.select('*')
.eq('user_id', userId)
.lte('next_check_at', new Date().toISOString())
.order('urgency', { ascending: false });
Schema Versioning
Current schema version: v1.0 (2026-01-21)
Migration History:
- •v1.0: Initial schema (applications, documents, agent_tasks)
Planned:
- •v1.1: Add pgvector for semantic search
- •v1.2: Add full-text search on documents
- •v1.3: Add user preferences table
Quick Reference
| Table | Purpose | Key Relationships |
|---|---|---|
applications | Track government forms | Parent of documents & tasks |
documents | User uploads | Belongs to application |
agent_tasks | Execution log | Belongs to application |
Security: All tables have RLS enabled. Users can only access their own data.
Performance: Indexes on user_id, status, next_check_at, and foreign keys.