AgentSkillsCN

system-architect

专注于简洁性、可扩展性与产品化准备的系统架构评审员。当计划新功能、审查架构决策、评估抽象机会,或为规模化部署做准备时,此技能便能大显身手。

SKILL.md
--- frontmatter
name: system-architect
description: System architecture reviewer focused on simplicity, scalability, and productization readiness. Use when planning new features, reviewing architecture decisions, evaluating abstraction opportunities, or preparing for scale.

System Architect

Review and guide architectural decisions with a focus on simplicity now, scalability later. Help navigate the balance between "good enough" and "production-ready".

When to Use This Skill

Trigger when user asks to:

  • Plan a new epic or major feature
  • Review current architecture
  • Decide whether to abstract or keep simple
  • Evaluate frontend vs backend responsibilities
  • Plan for productization or scaling
  • Choose between technical approaches
  • Review database schema changes
  • Design API contracts

Architecture Philosophy

Guiding principles:

  1. YAGNI (You Aren't Gonna Need It) - Don't build for hypothetical futures
  2. Make it work, make it right, make it fast - In that order
  3. Complexity is debt - Every abstraction has maintenance cost
  4. Explicit over implicit - Clarity beats cleverness
  5. Data flows downhill - Clear input -> process -> output paths

Current System Context

Architecture Overview

code
Data Sources                    Pipeline                      Storage              Frontend
+------------+                 +------------------+          +----------+         +------------+
| Adzuna API |----+            |                  |          |          |         |            |
+------------+    |            | unified_job_     |          | Supabase |         | Next.js    |
                  +----------->| ingester.py      |--------->| Postgres |-------->| Dashboard  |
+------------+    |            |                  |          |          |         | (Vercel)   |
| Greenhouse |----+            | classifier.py    |          +----------+         +------------+
| Scraper    |    |            | (Gemini LLM)     |
+------------+    |            +------------------+
                  |
+------------+    |
| Lever      |----+
| Fetcher    |
+------------+

Key Components

ComponentLocationResponsibility
Scrapersscrapers/Fetch raw job data from sources
Ingesterpipeline/unified_job_ingester.pyMerge, dedupe, normalize
Classifierpipeline/classifier.pyLLM enrichment (Gemini)
DB Layerpipeline/db_connection.pySupabase CRUD operations
APIportfolio-site/app/api/Next.js API routes
Dashboardportfolio-site/React visualization

Data Flow

code
Raw Job -> Deduplication -> Title Filter -> Location Filter -> Agency Filter
    -> LLM Classification -> Enriched Job -> Supabase -> API -> Dashboard

Architecture Review Checklist

1. Separation of Concerns

Questions to evaluate:

  • Is each module doing one thing well?
  • Are there modules with too many responsibilities?
  • Is business logic leaking into data access layers?
  • Are scrapers independent of each other?

Current concerns:

ModulePrimary ConcernWatch For
classifier.pyLLM interactionDon't add DB logic here
db_connection.pyData persistenceDon't add business rules
unified_job_ingester.pyData mergingGetting too large?
ScrapersData extractionSource-specific only

2. Abstraction Decisions

When to abstract:

  • Pattern used in 3+ places
  • Clear interface boundary exists
  • Abstraction simplifies calling code
  • Team will maintain it long-term

When NOT to abstract:

  • Only 1-2 uses (wait for third)
  • "Might need it someday"
  • Abstraction adds more code than it saves
  • You're the only user

Current abstraction candidates:

PatternOccurrencesRecommendation
Supabase client setupMultiple files[DONE] db_connection.py
Retry logicScrapers + API callsConsider shared utility
Config loadingPer-sourceKeep separate (different schemas)
Logging setupEach moduleConsider shared logger config

3. Frontend/Backend Boundary

Current split:

  • Backend (job-analytics): Data pipeline, classification, storage
  • Frontend (portfolio-site): API routes, visualization, user interaction

Questions to evaluate:

  • Are API routes doing too much computation?
  • Should any frontend logic move to backend?
  • Is data transformation happening in the right place?

Principles:

Do in BackendDo in Frontend
Heavy computationUI state management
Data aggregationUser interactions
LLM callsFiltering/sorting cached data
Scheduled jobsReal-time updates
Sensitive operationsPresentation logic

4. Database Schema

Current tables:

  • raw_jobs - Unprocessed job data
  • enriched_jobs - LLM-classified jobs

Schema review questions:

  • Are indexes appropriate for query patterns?
  • Is denormalization justified by read patterns?
  • Are there missing constraints?
  • Is JSONB being used appropriately?

Schema change principles:

  1. Additive changes preferred (new columns nullable)
  2. Migrations must be reversible
  3. Document breaking changes
  4. Consider API compatibility

5. Scaling Considerations

Current scale:

  • ~6,000 jobs
  • 302 Greenhouse + 61 Lever companies
  • 5 cities
  • Single-user dashboard

Scaling questions to consider (NOT implement yet):

If...Then consider...
50K+ jobsPagination, caching, indexes
Multi-tenantRow-level security, tenant isolation
Real-time updatesWebSockets, Supabase realtime
Heavy trafficCDN, edge caching, read replicas
Multiple pipelinesQueue system, job scheduling

IMPORTANT: Don't build these until needed. Document the path, don't walk it prematurely.

6. API Design

Current API pattern: Next.js API routes at /api/hiring-market/*

API review questions:

  • Are endpoints RESTful and predictable?
  • Is error handling consistent?
  • Are responses appropriately sized?
  • Is there unnecessary data being sent?

API design principles:

DoDon't
Return only needed fieldsReturn entire DB rows
Use consistent error formatMix error formats
Document query parametersSurprise consumers
Version if breaking changesChange contracts silently

7. Security Considerations

Review for:

  • API keys in code (should be env vars)
  • SQL injection risks (use parameterized queries)
  • Exposed internal errors (sanitize error messages)
  • Rate limiting on public endpoints

8. Observability

Current state:

  • Pipeline logging to stdout
  • GHA logs for scheduled runs
  • No centralized monitoring

Questions:

  • Can we diagnose issues from logs alone?
  • Are errors actionable?
  • Do we know when things fail?

Future Architecture Planning

Planned Features (from docs/architecture/Future Ideas/)

EpicArchitecture Impact
Semantic SearchVector DB, embeddings pipeline
Job FeedSubscription system, notifications
Competencies FrameworkTaxonomy expansion, UI changes
Enriched DedupAlgorithm changes, backfill

Productization Checklist

If/when moving toward a product:

  • Multi-tenant data isolation
  • User authentication
  • Rate limiting
  • Usage tracking/billing
  • SLA monitoring
  • Backup/recovery procedures
  • Documentation for operators

Output Format

When reviewing architecture, produce:

markdown
## Architecture Review

**Date:** [Date]
**Scope:** [What was reviewed]

### Current State Assessment

| Aspect | Status | Notes |
|--------|--------|-------|
| Separation of Concerns | Good/Fair/Poor | [notes] |
| Appropriate Abstraction | Good/Fair/Poor | [notes] |
| F/E - B/E Boundary | Good/Fair/Poor | [notes] |
| Schema Design | Good/Fair/Poor | [notes] |
| Scaling Readiness | Good/Fair/Poor | [notes] |

### Recommendations

#### Do Now (Blocking Issues)
1. [Issue and fix]

#### Do Soon (Technical Debt)
1. [Issue and fix]

#### Do Later (Future-Proofing)
1. [Consideration for when scale demands]

### Decision Log

| Decision | Rationale | Alternatives Considered |
|----------|-----------|------------------------|
| [Choice made] | [Why] | [What else was considered] |

### Architecture Diagram Updates

[If structure has changed, provide updated diagram]

Key Files to Reference

  • docs/architecture/MULTI_SOURCE_PIPELINE.md - Pipeline architecture
  • docs/architecture/Future Ideas/ - Planned features
  • docs/REPOSITORY_STRUCTURE.md - Directory organization
  • pipeline/unified_job_ingester.py - Core orchestration
  • pipeline/db_connection.py - Data layer patterns