System Architect

Review and guide architectural decisions with a focus on simplicity now, scalability later. Help navigate the balance between "good enough" and "production-ready".

When to Use This Skill

Trigger when user asks to:

•Plan a new epic or major feature
•Review current architecture
•Decide whether to abstract or keep simple
•Evaluate frontend vs backend responsibilities
•Plan for productization or scaling
•Choose between technical approaches
•Review database schema changes
•Design API contracts

Architecture Philosophy

Guiding principles:

•YAGNI (You Aren't Gonna Need It) - Don't build for hypothetical futures
•Make it work, make it right, make it fast - In that order
•Complexity is debt - Every abstraction has maintenance cost
•Explicit over implicit - Clarity beats cleverness
•Data flows downhill - Clear input -> process -> output paths

Current System Context

Architecture Overview

code

Data Sources                    Pipeline                      Storage              Frontend
+------------+                 +------------------+          +----------+         +------------+
| Adzuna API |----+            |                  |          |          |         |            |
+------------+    |            | unified_job_     |          | Supabase |         | Next.js    |
                  +----------->| ingester.py      |--------->| Postgres |-------->| Dashboard  |
+------------+    |            |                  |          |          |         | (Vercel)   |
| Greenhouse |----+            | classifier.py    |          +----------+         +------------+
| Scraper    |    |            | (Gemini LLM)     |
+------------+    |            +------------------+
                  |
+------------+    |
| Lever      |----+
| Fetcher    |
+------------+

Key Components

Component	Location	Responsibility
Scrapers	`scrapers/`	Fetch raw job data from sources
Ingester	`pipeline/unified_job_ingester.py`	Merge, dedupe, normalize
Classifier	`pipeline/classifier.py`	LLM enrichment (Gemini)
DB Layer	`pipeline/db_connection.py`	Supabase CRUD operations
API	`portfolio-site/app/api/`	Next.js API routes
Dashboard	`portfolio-site/`	React visualization

Data Flow

code

Raw Job -> Deduplication -> Title Filter -> Location Filter -> Agency Filter
    -> LLM Classification -> Enriched Job -> Supabase -> API -> Dashboard

Architecture Review Checklist

1. Separation of Concerns

Questions to evaluate:

•Is each module doing one thing well?
•Are there modules with too many responsibilities?
•Is business logic leaking into data access layers?
•Are scrapers independent of each other?

Current concerns:

Module	Primary Concern	Watch For
`classifier.py`	LLM interaction	Don't add DB logic here
`db_connection.py`	Data persistence	Don't add business rules
`unified_job_ingester.py`	Data merging	Getting too large?
Scrapers	Data extraction	Source-specific only

2. Abstraction Decisions

When to abstract:

•Pattern used in 3+ places
•Clear interface boundary exists
•Abstraction simplifies calling code
•Team will maintain it long-term

When NOT to abstract:

•Only 1-2 uses (wait for third)
•"Might need it someday"
•Abstraction adds more code than it saves
•You're the only user

Current abstraction candidates:

Pattern	Occurrences	Recommendation
Supabase client setup	Multiple files	[DONE] db_connection.py
Retry logic	Scrapers + API calls	Consider shared utility
Config loading	Per-source	Keep separate (different schemas)
Logging setup	Each module	Consider shared logger config

3. Frontend/Backend Boundary

Current split:

•Backend (job-analytics): Data pipeline, classification, storage
•Frontend (portfolio-site): API routes, visualization, user interaction

Questions to evaluate:

•Are API routes doing too much computation?
•Should any frontend logic move to backend?
•Is data transformation happening in the right place?

Principles:

Do in Backend	Do in Frontend
Heavy computation	UI state management
Data aggregation	User interactions
LLM calls	Filtering/sorting cached data
Scheduled jobs	Real-time updates
Sensitive operations	Presentation logic

4. Database Schema

Current tables:

•raw_jobs - Unprocessed job data
•enriched_jobs - LLM-classified jobs

Schema review questions:

•Are indexes appropriate for query patterns?
•Is denormalization justified by read patterns?
•Are there missing constraints?
•Is JSONB being used appropriately?

Schema change principles:

•Additive changes preferred (new columns nullable)
•Migrations must be reversible
•Document breaking changes
•Consider API compatibility

5. Scaling Considerations

Current scale:

•~6,000 jobs
•302 Greenhouse + 61 Lever companies
•5 cities
•Single-user dashboard

Scaling questions to consider (NOT implement yet):

If...	Then consider...
50K+ jobs	Pagination, caching, indexes
Multi-tenant	Row-level security, tenant isolation
Real-time updates	WebSockets, Supabase realtime
Heavy traffic	CDN, edge caching, read replicas
Multiple pipelines	Queue system, job scheduling

IMPORTANT: Don't build these until needed. Document the path, don't walk it prematurely.

6. API Design

Current API pattern: Next.js API routes at /api/hiring-market/*

API review questions:

•Are endpoints RESTful and predictable?
•Is error handling consistent?
•Are responses appropriately sized?
•Is there unnecessary data being sent?

API design principles:

Do	Don't
Return only needed fields	Return entire DB rows
Use consistent error format	Mix error formats
Document query parameters	Surprise consumers
Version if breaking changes	Change contracts silently

7. Security Considerations

Review for:

•API keys in code (should be env vars)
•SQL injection risks (use parameterized queries)
•Exposed internal errors (sanitize error messages)
•Rate limiting on public endpoints

8. Observability

Current state:

•Pipeline logging to stdout
•GHA logs for scheduled runs
•No centralized monitoring

Questions:

•Can we diagnose issues from logs alone?
•Are errors actionable?
•Do we know when things fail?

Future Architecture Planning

Planned Features (from `docs/architecture/Future Ideas/`)

Epic	Architecture Impact
Semantic Search	Vector DB, embeddings pipeline
Job Feed	Subscription system, notifications
Competencies Framework	Taxonomy expansion, UI changes
Enriched Dedup	Algorithm changes, backfill

Productization Checklist

If/when moving toward a product:

Output Format

When reviewing architecture, produce:

markdown

## Architecture Review

**Date:** [Date]
**Scope:** [What was reviewed]

### Current State Assessment

| Aspect | Status | Notes |
|--------|--------|-------|
| Separation of Concerns | Good/Fair/Poor | [notes] |
| Appropriate Abstraction | Good/Fair/Poor | [notes] |
| F/E - B/E Boundary | Good/Fair/Poor | [notes] |
| Schema Design | Good/Fair/Poor | [notes] |
| Scaling Readiness | Good/Fair/Poor | [notes] |

### Recommendations

#### Do Now (Blocking Issues)
1. [Issue and fix]

#### Do Soon (Technical Debt)
1. [Issue and fix]

#### Do Later (Future-Proofing)
1. [Consideration for when scale demands]

### Decision Log

| Decision | Rationale | Alternatives Considered |
|----------|-----------|------------------------|
| [Choice made] | [Why] | [What else was considered] |

### Architecture Diagram Updates

[If structure has changed, provide updated diagram]

Key Files to Reference

•docs/architecture/MULTI_SOURCE_PIPELINE.md - Pipeline architecture
•docs/architecture/Future Ideas/ - Planned features
•docs/REPOSITORY_STRUCTURE.md - Directory organization
•pipeline/unified_job_ingester.py - Core orchestration
•pipeline/db_connection.py - Data layer patterns