Prism Clustering & Federation

Guide for distributed Prism deployments with horizontal scaling and high availability.

Deployment Modes

Mode	Nodes	Consensus	Use Case
Single	1	None	Development, small datasets
Federation	2+	None	Read scaling, simple HA, multi-DC
Cluster	3+	Raft	Full distributed, auto-failover

Architecture layers:

code

Layer 3: Cluster  (Raft consensus, auto-rebalancing, leader election)
Layer 2: Federation  (Query routing, result merging, health checks)
Layer 1: Single Node  (Indexing, search, aggregations)

Node Discovery

Static (Development/Fixed Deployments)

toml

[federation.discovery]
backend = "static"
nodes = ["node1:3000", "node2:3000", "node3:3000"]

DNS-Based (Kubernetes)

toml

[federation.discovery]
backend = "dns"
dns_name = "prism-headless.default.svc.cluster.local"
dns_refresh_interval_secs = 30

Kubernetes headless service:

yaml

apiVersion: v1
kind: Service
metadata:
  name: prism-headless
spec:
  clusterIP: None
  selector:
    app: prism
  ports:
    - port: 7000
      name: cluster

Gossip (Dynamic Environments)

toml

[federation.discovery]
backend = "gossip"
gossip_port = 7946
gossip_seeds = ["seed1:7946", "seed2:7946"]

Inter-Node Transport

QUIC (Recommended)

toml

[cluster.transport]
transport = "quic"
bind_port = 7000
quic_idle_timeout_ms = 30000
quic_max_streams = 100

# TLS required for QUIC
cert_path = "/etc/prism/cluster-cert.pem"
key_path = "/etc/prism/cluster-key.pem"

TCP (Alternative)

toml

[cluster.transport]
transport = "tcp"
bind_port = 7000

Generate cluster certificates:

bash

# Self-signed for internal cluster
openssl req -x509 -newkey rsa:4096 -keyout cluster-key.pem -out cluster-cert.pem \
  -days 365 -nodes -subj "/CN=prism-cluster"

Node Identity

toml

[node]
id = "node-1"              # Unique identifier
zone = "eu-west-1a"        # Availability zone
rack = "rack-42"           # Physical rack (optional)
region = "eu-west-1"       # Cloud region (optional)

[node.attributes]
disk_type = "ssd"
storage_gb = 500

Replication

Replication Factor

toml

[collection.replication]
factor = 2                    # Number of replicas
min_replicas_for_write = 1    # Minimum for write success

Mode	Recommended RF	Reason
Federation	2	Survives 1 node failure
Cluster (Raft)	3	Odd number for quorum

Zone-Aware Placement

Replicas automatically spread across zones:

toml

[cluster.placement]
spread_across = "zone"  # zone | rack | region | none

Constraint: Never two replicas of same shard in same zone.

Read Consistency

toml

[federation.consistency]
read = "eventual"              # Default: fastest
# read = "read-your-writes"    # Client sees own writes
# read = "bounded"             # Max staleness
# read = "strong"              # Always fresh

bounded_staleness_ms = 5000    # For bounded mode

Level	Latency	Freshness
eventual	Lowest	May be stale
read-your-writes	Low	Own writes visible
bounded	Medium	Max N seconds old
strong	Highest	Always current

Query Execution

Partial Results on Failure

toml

[federation.query]
allow_partial_results = true
partial_results_timeout_ms = 5000
min_successful_shards = 1

When some nodes fail, search returns partial results with warning:

json

{
  "results": [...],
  "shards": {
    "total": 3,
    "successful": 2,
    "failed": 1,
    "failures": [{"node": "node-3", "reason": "timeout"}]
  }
}

Health Checks

toml

[cluster.health]
heartbeat_interval_ms = 1000
failure_threshold = 3          # Missed heartbeats before suspect
suspect_timeout_ms = 5000      # Suspect → dead timeout
on_node_failure = "rebalance"  # rebalance | none

Node states:

code

alive → suspect → dead → removed

Transition	Trigger
alive → suspect	Missed heartbeats
suspect → dead	Timeout without recovery
suspect → alive	Heartbeat received
dead → removed	Admin action or auto-cleanup

Rebalancing

toml

[cluster.rebalancing]
imbalance_threshold_percent = 15
max_concurrent_moves = 2
max_bytes_per_sec = "100MB"
continuous_optimization = false
pause_schedule = "0 9-17 * * MON-FRI"  # Pause during business hours

Rebalancing priority:

•Under-replicated shards (critical)
•Unassigned shards
•Imbalanced nodes (soft)

Split-Brain Handling

Federation Mode

Low risk - nodes are independent, eventual consistency. Partitioned nodes continue serving reads.

Cluster Mode (Raft)

Minority partition becomes read-only:

toml

[cluster.consistency]
min_nodes_for_write = "quorum"   # Majority required
partition_behavior = "read_only"
allow_stale_reads = true

Distributed Embedding Cache

Two-tier cache: L1 local (fast) + L2 distributed (shared):

toml

[embedding.cache]
# L1 - local per node
l1_enabled = true
l1_max_entries = 10000

# L2 - distributed
l2_backend = "redis"  # redis | sqlite | s3
l2_url = "redis://cache-cluster:6379"
l2_ttl_secs = 86400

Rolling Upgrades

•Deploy new version to one node
•Node restarts, rejoins cluster
•Wait for health check pass
•Wait for replication catch-up
•Repeat for next node

Protocol versioning:

toml

[cluster]
protocol_version = 1
min_supported_version = 1

Cluster Metrics

Metric	Description
`prism_node_state{node_id, state}`	Node health state
`prism_shard_status{shard, state}`	Shard assignment
`prism_replication_lag_seconds{shard, replica}`	Replication delay
`prism_query_shard_latency_seconds{node, shard}`	Per-shard latency

Prometheus alerting:

yaml

- alert: PrismNodeDown
  expr: prism_node_state{state="dead"} == 1
  for: 1m
  labels:
    severity: critical

- alert: PrismReplicationLag
  expr: prism_replication_lag_seconds > 30
  for: 5m
  labels:
    severity: warning

Full Cluster Config Example

toml

# Node identity
[node]
id = "prism-node-1"
zone = "eu-west-1a"

# Discovery
[federation.discovery]
backend = "dns"
dns_name = "prism-headless.prism.svc.cluster.local"

# Transport
[cluster.transport]
transport = "quic"
bind_port = 7000
cert_path = "/etc/prism/certs/cluster.pem"
key_path = "/etc/prism/certs/cluster-key.pem"

# Replication
[collection.replication]
factor = 3
min_replicas_for_write = 2

# Consistency
[federation.consistency]
read = "eventual"

# Health
[cluster.health]
heartbeat_interval_ms = 1000
failure_threshold = 3

# Rebalancing
[cluster.rebalancing]
imbalance_threshold_percent = 15
max_concurrent_moves = 2

# Distributed cache
[embedding.cache]
l1_enabled = true
l1_max_entries = 10000
l2_backend = "redis"
l2_url = "redis://redis:6379"

Kubernetes StatefulSet

yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prism
spec:
  serviceName: prism-headless
  replicas: 3
  selector:
    matchLabels:
      app: prism
  template:
    metadata:
      labels:
        app: prism
    spec:
      containers:
        - name: prism
          image: prism:latest
          ports:
            - containerPort: 3080
              name: http
            - containerPort: 7000
              name: cluster
          env:
            - name: NODE_ID
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: NODE_ZONE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['topology.kubernetes.io/zone']
          volumeMounts:
            - name: data
              mountPath: /data
            - name: certs
              mountPath: /etc/prism/certs
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 100Gi