AgentSkillsCN

infrastructure

TMNL Docker 基础设施文档。涵盖服务拓扑、健康检查与故障排除。当您调试容器或深入了解服务架构时,这一文档将为您提供重要参考。

SKILL.md
--- frontmatter
name: infrastructure
description: TMNL Docker infrastructure documentation. Service topology, health checks, troubleshooting. Model-invoked when debugging containers or understanding service architecture.
model_invoked: true
triggers:
  - "infrastructure"
  - "docker architecture"
  - "service topology"
  - "container health"
  - "why is service down"
  - "container not starting"
  - "port conflict"
  - "what's running"
  - "infra status"
  - "check containers"

TMNL Infrastructure

Docker-based infrastructure for the TMNL stack. All services defined in docker/docker-compose.yml.

Visualization Protocol

When presenting infrastructure status, generate ASCII diagrams dynamically based on actual container state:

  1. Collect data: Run /infra:status --json --resources
  2. Analyze state: Count healthy/unhealthy, identify issues
  3. Render diagram: Draw topology with health indicators

Health Indicators

  • (green) — healthy/running
  • (red) — unhealthy/exited
  • (yellow) — starting/created

Example Dynamic Output

code
┌─────────────────────────────────────────────────────────────────┐
│                    TMNL Infrastructure Status                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   CORE SERVICES                                                  │
│   ┌─────────────┐      ┌─────────────────┐                      │
│   │  postgres   │─────▶│    electric     │                      │
│   │ ● :5432     │      │ ● :3000         │                      │
│   └─────────────┘      └─────────────────┘                      │
│          │                                                       │
│          ▼                                                       │
│   ┌─────────────────┐                                           │
│   │ durable-streams │                                           │
│   │ ● :3030         │                                           │
│   └─────────────────┘                                           │
│                                                                  │
│   SUPPORT SERVICES                                               │
│   ┌─────────┐  ┌─────────┐  ┌─────────┐                         │
│   │  nats   │  │  minio  │  │ y-sweet │                         │
│   │ ● :4222 │  │ ○ :9000 │  │ ● :8080 │                         │
│   └─────────┘  └─────────┘  └─────────┘                         │
│                                                                  │
├─────────────────────────────────────────────────────────────────┤
│ ● healthy (5)  ○ unhealthy (1)  CPU: 8%  MEM: 1.2GB            │
└─────────────────────────────────────────────────────────────────┘

Smart Routing

Choose output depth based on state:

ConditionResponse
All healthyCompact topology with summary
Unhealthy servicesFull diagnostic with logs
Resource pressureResource-focused view
Connectivity issuesNetwork topology focus
Specific service queryService detail + dependencies

Service Topology

code
┌─────────────────────────────────────────────────────────────────┐
│                         TMNL Stack                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────┐     ┌─────────────────┐     ┌───────────────┐  │
│  │  postgres   │────▶│    electric     │────▶│  search-*     │  │
│  │  (5432)     │     │    (3000)       │     │  (8100-8102)  │  │
│  └─────────────┘     └─────────────────┘     └───────────────┘  │
│         │                                           │            │
│         │            ┌─────────────────┐           │            │
│         └───────────▶│ durable-streams │◀──────────┘            │
│                      │    (3030)       │                        │
│                      └─────────────────┘                        │
│                                                                  │
│  ┌─────────────┐     ┌─────────────────┐     ┌───────────────┐  │
│  │    nats     │     │     minio       │     │   y-sweet     │  │
│  │ (4222/8222) │     │  (9000/9001)    │     │    (8080)     │  │
│  └─────────────┘     └─────────────────┘     └───────────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Quick Reference

ServicePortPurposeHealth Check
postgres5432PostGIS + TimescaleDBpg_isready
durable-streams3030Persistent event streamsHTTP /health
electric3000Real-time Postgres syncHTTP /health
search-cluster-coordinator8100Effect Cluster coordinatorTCP
search-cluster-sources8101Effect Cluster data sourcesTCP
ingestion-cluster8102Data ingestion RPCTCP
nats4222, 8222Message brokerHTTP monitoring
minio9000, 9001S3 object storageHTTP /minio/health
y-sweet8080Yjs document syncTCP

Service Groups

GroupServicesUse Case
corepostgres, durable-streams, electricEssential services
clustersearch-cluster-*, ingestion-clusterEffect Cluster nodes
collaby-sweet, natsReal-time collaboration
accessssh, ngrokRemote access

Commands

bash
/infra:up                          # Start core services
/infra:up --group cluster          # Start cluster services
/infra:up --all                    # Start everything
/infra:down                        # Stop core services
/infra:status                      # Check health (table)
/infra:status --json --resources   # Full data for visualization
/infra:logs postgres               # View logs
/infra:rebuild <service>           # Rebuild service
/infra:query "why is X failing"    # Ask infrastructure questions

Dependency Chain

code
postgres (foundation)
    └── electric (requires postgres logical replication)
        └── search-cluster-* (requires electric sync)
            └── ingestion-cluster (requires search cluster)

minio (independent)
    └── y-sweet (stores documents in minio)

durable-streams (independent)
    └── all services can emit events

nats (independent)
    └── real-time messaging between services

Common Issues & Fixes

Electric Restart Loop

Symptom: Electric container constantly restarting Cause: Missing ELECTRIC_INSECURE=true for dev mode Fix: Add to docker-compose.yml environment

Postgres Not Ready

Symptom: Services fail waiting for postgres Cause: TimescaleDB extension loading takes time Fix: Wait for start_period: 30s to complete

Search Cluster Connection Refused

Symptom: 8100/8101/8102 not accessible Cause: Service not built or health check failing Fix: /infra:rebuild search-cluster-coordinator

NATS WebSocket Issues

Symptom: Browser can't connect to NATS Cause: Port 9222 not exposed or config missing Fix: Check nats-server.conf has websocket block

Documentation

Service Briefings

Detailed documentation for each service:

Journals

Runbooks and troubleshooting guides:

Quick Diagnostics

bash
# Check all container status
docker compose ps

# Check specific service health
docker compose ps postgres

# View recent logs
docker compose logs --tail 50 postgres

# Check network connectivity
docker compose exec postgres pg_isready

# Check electric replication
docker compose exec postgres psql -U tmnl -c "SELECT * FROM pg_replication_slots;"