design-core

系统设计基础——第一性原理思考、演进阶段与决策框架。

SKILL.md

--- frontmatter

name: design-core
description: System design foundations — first-principles thinking, evolution stages, and decision frameworks

What I do

•Provide first-principles decomposition for system design problems
•Document mental models, evolution stages, and architecture decision trees
•Define design review checklists and common anti-patterns
•Supply cross-references to domain-specific design skills

When to use me

Use this skill as the entry point for any system design task. Pair with a domain-specific skill (design-web3, design-defi, design-arbitrage) for specialized guidance.

Core Mental Models

Mental Model	What It Means	When To Apply
Trade-offs all the way down	Every design choice sacrifices something; make the sacrifice explicit	Architecture decisions, technology selection
Boundaries first	Define system boundaries and interfaces before internals	Service decomposition, API design
Complexity is debt	Every abstraction, dependency, and indirection has carrying cost	Adding components, choosing patterns
Prove it works	A running prototype beats a perfect diagram	Early design phases, technology evaluation
Scale is earned	Design for current load + 10x; re-architect at 100x	Capacity planning, infrastructure decisions
Failure is default	Systems fail; design for graceful degradation, not perfection	Reliability engineering, error handling
Delete before add	Removing complexity is more valuable than adding capability	Feature creep, refactoring decisions

Core Principles

•Decompose to atoms first -- Break the problem into its smallest independent sub-problems before proposing any architecture.
•Simplest working system first -- Build the simplest thing that validates your core hypothesis. Add complexity only when forced by real constraints.
•Working software over hypothesis -- A running, verifiable system always beats a theoretical design. Ship, measure, iterate.
•Make failure cheap -- Design so that failures are detected fast, blast radius is contained, and recovery is automated.
•Explicit over implicit -- State assumptions, constraints, and trade-offs in writing. Implicit knowledge is a single point of failure.
•Gradual evolution -- Tune and alter incrementally as usage grows or real issues emerge. Avoid big-bang rewrites.
•Pragmatic over elegant -- Choose what works and is maintainable over what is theoretically beautiful.

System Evolution Stages

Stage	Scale Trigger	Architectural Response
0 - Prototype	0-100 users	Monolith, single database, manual deployment
1 - Validated	100-10K users	Read replicas, CDN, CI/CD pipeline, basic monitoring
2 - Growing	10K-100K users	Cache layer, async processing, horizontal scaling, load balancing
3 - Scaling	100K-1M users	Service decomposition, event-driven architecture, distributed tracing
4 - Platform	1M+ users	Multi-region, CQRS/event sourcing, dedicated teams per service

Enter each stage only when the previous stage's limits are hit. Premature advancement is the most common and most expensive architectural mistake.

First-Principles Decomposition Protocol

•State the problem -- One sentence: what must the system do?
•Identify inputs, outputs, and invariants -- What goes in, what comes out, what must always be true?
•Find the hardest sub-problem -- Which component has the tightest constraints (latency, consistency, throughput)?
•Design the simplest solution for the hardest part -- Solve the constraint that matters most with the least complexity.
•Verify with back-of-envelope math -- Will it handle the load? Storage? Bandwidth? Latency budget?
•Iterate -- Add the next hardest sub-problem. Repeat until the system is complete.

Architecture Decision Trees

Data Store Selection

code

Need ACID transactions across multiple entities?
  YES --> Relational DB (PostgreSQL, MySQL)
  NO  |
      v
Need flexible schema or document storage?
  YES --> Document DB (MongoDB, Firestore)
  NO  |
      v
Need sub-millisecond key-value lookups?
  YES --> In-memory store (Redis, Memcached)
  NO  |
      v
Need full-text search?
  YES --> Search engine (Elasticsearch, Typesense)
  NO  |
      v
Need time-series or append-only writes?
  YES --> Time-series DB (TimescaleDB, InfluxDB)
  NO  --> Start with PostgreSQL (most versatile default)

Communication Patterns

code

Need immediate response from the receiver?
  YES --> Synchronous (HTTP/gRPC)
  NO  |
      v
Need guaranteed delivery with ordering?
  YES --> Message queue (Kafka, Pub/Sub)
  NO  |
      v
Need fan-out to multiple consumers?
  YES --> Pub/Sub (Cloud Pub/Sub, SNS)
  NO  |
      v
Need real-time bidirectional communication?
  YES --> WebSockets or SSE
  NO  --> Async HTTP with polling or webhooks

Deployment Topology

code

Single team, single service?
  YES --> Monolith on managed compute (Cloud Run, App Engine)
  NO  |
      v
Multiple teams, clear domain boundaries?
  YES --> Service per bounded context (Kubernetes, Cloud Run)
  NO  |
      v
Need extreme scale for specific components?
  YES --> Decompose hot path only; keep the rest monolithic
  NO  --> Modular monolith with clear internal boundaries

Design Review Checklist

Simplicity

• Can any component be removed without breaking the core use case?
• Are there fewer than 3 synchronous hops in the critical path?
• Is the data model normalized to the simplest correct form?
• Could a simpler technology achieve the same result?

Failure Modes

• What happens when each dependency is unavailable for 5 minutes?
• Are timeouts, retries, and circuit breakers configured?
• Is there a kill switch for every non-critical feature?
• Can the system degrade gracefully under partial failure?

Evolution

• Can the schema evolve without downtime?
• Are service interfaces versioned?
• Can components be replaced independently?
• Is there a rollback plan for every deployment?

Anti-Patterns

Anti-Pattern	Why It Fails	What To Do Instead
Premature microservices	Distributed complexity without distributed team or load	Start monolithic; decompose when forced by scale or team boundaries
Resume-driven architecture	Technology chosen for career value, not problem fit	Choose the most boring technology that solves the problem
Diagram-driven development	Architecture diagrams without running code	Build a walking skeleton first; diagram what you built
Speculative generality	Building for hypothetical future requirements	YAGNI -- build for today's requirements, design for tomorrow's
Distributed monolith	Microservices that must deploy together	If services share a release cycle, they are one service
Shared mutable state	Multiple services writing to the same database	Each service owns its data; communicate via APIs or events
No back-of-envelope math	Designing without validating capacity assumptions	Estimate QPS, storage, bandwidth before choosing architecture

Domain-Specific Skills

After establishing foundations with this skill, load the appropriate domain-specific skill for specialized guidance:

Domain	Skill	Coverage
Web3 and blockchain	`design-web3`	On-chain/off-chain decisions, smart contract patterns, gas optimization
DeFi protocols	`design-defi`	Protocol composability, invariant design, economic security
Trading and arbitrage	`design-arbitrage`	Latency budgets, execution engines, risk management as design constraint