Supabase Database Architecture for Arlis

Overview

Arlis uses Supabase (PostgreSQL) to store user applications, documents, and agent execution history. This skill documents the schema design, security policies, and best practices.

Core Tables

1. `applications` - Main Application Tracking

Purpose: Tracks government applications (SNAP, Medicaid, passport, etc.) that Arlis is helping users complete.

sql

CREATE TABLE applications (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id TEXT NOT NULL,
  title TEXT NOT NULL,
  document_type TEXT, -- 'snap', 'medicaid', 'visa', 'passport', 'tax_filing'
  portal_url TEXT,
  status TEXT DEFAULT 'draft', -- draft, in_progress, submitted, pending, approved, denied
  urgency TEXT DEFAULT 'medium', -- low, medium, high, critical
  deadline TIMESTAMPTZ,
  extracted_data JSONB NOT NULL DEFAULT '{}',
  next_action TEXT,
  next_check_at TIMESTAMPTZ,
  confirmation_number TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

Key Fields:

•extracted_data: JSONB storing form field values (e.g., {"full_name": "John Doe", "ssn": "***-**-1234"})
•next_check_at: When Arlis should check the portal again (for marathon agents)
•status: Lifecycle state of the application

Indexes:

sql

CREATE INDEX idx_applications_user_id ON applications(user_id);
CREATE INDEX idx_applications_status ON applications(status);
CREATE INDEX idx_applications_next_check ON applications(next_check_at) WHERE next_check_at IS NOT NULL;

2. `documents` - Uploaded Files

Purpose: Stores user-uploaded documents (photos of letters, PDFs, IDs) with extracted text/fields.

sql

CREATE TABLE documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  application_id UUID REFERENCES applications(id) ON DELETE CASCADE,
  type TEXT, -- 'income_proof', 'id_card', 'letter', 'notice', 'utility_bill'
  original_url TEXT NOT NULL, -- Supabase Storage URL
  extracted_text TEXT,
  extracted_fields JSONB,
  confidence_score FLOAT,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

Key Fields:

•original_url: Points to Supabase Storage bucket (e.g., documents/user123/file.pdf)
•extracted_text: Full text from Gemini Vision OCR
•extracted_fields: Structured data (e.g., {"bill_date": "2024-12-15", "amount": "$45.32"})
•confidence_score: Gemini Vision confidence (0.0 - 1.0)

3. `agent_tasks` - Execution History

Purpose: Logs every action Arlis takes (for transparency, debugging, and user trust).

sql

CREATE TABLE agent_tasks (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  application_id UUID REFERENCES applications(id) ON DELETE CASCADE,
  tool_name TEXT NOT NULL,
  tool_args JSONB,
  tool_result JSONB,
  screenshot_url TEXT, -- Screenshot of browser state
  success BOOLEAN DEFAULT true,
  error_message TEXT,
  execution_time_ms INTEGER,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

Key Fields:

•tool_name: e.g., "browserbase_navigate", "fill_form_field", "upload_document"
•screenshot_url: Browserbase screenshot for visual debugging
•execution_time_ms: Performance tracking

Index:

sql

CREATE INDEX idx_agent_tasks_application_id ON agent_tasks(application_id);

Row Level Security (RLS)

Critical: All tables MUST have RLS enabled to prevent users from accessing each other's data.

Applications Table

sql

ALTER TABLE applications ENABLE ROW LEVEL SECURITY;

-- Users can only view their own applications
CREATE POLICY "Users can view own applications"
  ON applications FOR SELECT
  USING (auth.uid()::text = user_id);

-- Users can only insert applications for themselves
CREATE POLICY "Users can insert own applications"
  ON applications FOR INSERT
  WITH CHECK (auth.uid()::text = user_id);

-- Users can only update their own applications
CREATE POLICY "Users can update own applications"
  ON applications FOR UPDATE
  USING (auth.uid()::text = user_id);

Documents Table

sql

ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own documents"
  ON documents FOR SELECT
  USING (
    EXISTS (
      SELECT 1 FROM applications
      WHERE applications.id = documents.application_id
      AND applications.user_id = auth.uid()::text
    )
  );

Agent Tasks Table

sql

ALTER TABLE agent_tasks ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own agent tasks"
  ON agent_tasks FOR SELECT
  USING (
    EXISTS (
      SELECT 1 FROM applications
      WHERE applications.id = agent_tasks.application_id
      AND applications.user_id = auth.uid()::text
    )
  );

Future Enhancements

1. pgvector for Semantic Search

Use Case: Search documents by meaning (e.g., "find my income proof from December")

sql

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Add embedding column to documents
ALTER TABLE documents ADD COLUMN embedding vector(1536);

-- Create index for similarity search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);

Usage:

typescript

// Generate embedding with Gemini
const embedding = await generateEmbedding(query);

// Search similar documents
const { data } = await supabase
  .rpc('match_documents', {
    query_embedding: embedding,
    match_threshold: 0.8,
    match_count: 5
  });

2. Full-Text Search

Use Case: Search extracted text from documents

sql

-- Add tsvector column
ALTER TABLE documents ADD COLUMN search_vector tsvector;

-- Create trigger to auto-update search vector
CREATE FUNCTION documents_search_trigger() RETURNS trigger AS $$
BEGIN
  NEW.search_vector := to_tsvector('english', COALESCE(NEW.extracted_text, ''));
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER documents_search_update
  BEFORE INSERT OR UPDATE ON documents
  FOR EACH ROW EXECUTE FUNCTION documents_search_trigger();

-- Create GIN index
CREATE INDEX documents_search_idx ON documents USING GIN(search_vector);

Best Practices

1. Always Use Transactions for Multi-Step Operations

typescript

const { data, error } = await supabase.rpc('create_application_with_docs', {
  application_data: { title: "Passport Renewal", ... },
  document_urls: ["url1", "url2"]
});

2. Use JSONB for Flexible Data

•Store form fields in extracted_data as JSONB
•Allows querying: WHERE extracted_data->>'ssn' IS NOT NULL
•Supports indexing: CREATE INDEX ON applications USING GIN(extracted_data);

3. Cascade Deletes

•Documents and agent_tasks use ON DELETE CASCADE
•When an application is deleted, all related data is automatically removed

4. Timestamps

•Always include created_at and updated_at
•Use TIMESTAMPTZ (timezone-aware) not TIMESTAMP

5. Indexes

•Index foreign keys (application_id)
•Index frequently queried fields (user_id, status)
•Use partial indexes for sparse data (WHERE next_check_at IS NOT NULL)

Common Queries

Get All Applications for a User

typescript

const { data } = await supabase
  .from('applications')
  .select('*')
  .eq('user_id', userId)
  .order('created_at', { ascending: false });

Get Application with Documents and Tasks

typescript

const { data } = await supabase
  .from('applications')
  .select(`
    *,
    documents (*),
    agent_tasks (*)
  `)
  .eq('id', applicationId)
  .single();

Find Applications Needing Attention

typescript

const { data } = await supabase
  .from('applications')
  .select('*')
  .eq('user_id', userId)
  .lte('next_check_at', new Date().toISOString())
  .order('urgency', { ascending: false });

Schema Versioning

Current schema version: v1.0 (2026-01-21)

Migration History:

•v1.0: Initial schema (applications, documents, agent_tasks)

Planned:

•v1.1: Add pgvector for semantic search
•v1.2: Add full-text search on documents
•v1.3: Add user preferences table

Quick Reference

Table	Purpose	Key Relationships
`applications`	Track government forms	Parent of documents & tasks
`documents`	User uploads	Belongs to application
`agent_tasks`	Execution log	Belongs to application

Security: All tables have RLS enabled. Users can only access their own data.

Performance: Indexes on user_id, status, next_check_at, and foreign keys.

Supabase Database Architecture for Arlis

Overview

Core Tables

1. applications - Main Application Tracking

2. documents - Uploaded Files

3. agent_tasks - Execution History

Row Level Security (RLS)

Applications Table

Documents Table

Agent Tasks Table

Future Enhancements

1. pgvector for Semantic Search

2. Full-Text Search

Best Practices

1. Always Use Transactions for Multi-Step Operations

2. Use JSONB for Flexible Data

3. Cascade Deletes

4. Timestamps

5. Indexes

Common Queries

Get All Applications for a User

Get Application with Documents and Tasks

Find Applications Needing Attention

Schema Versioning

Quick Reference

1. `applications` - Main Application Tracking

2. `documents` - Uploaded Files

3. `agent_tasks` - Execution History