Setup Service Mesh
Deploy and configure a service mesh for secure service-to-service communication and advanced traffic management.
When to Use
- •Microservices architecture requires encrypted service-to-service communication
- •Need fine-grained traffic control (canary deployments, A/B testing, traffic splitting)
- •Require observability across all service interactions without application changes
- •Enforce security policies (mTLS, authorization) at the infrastructure level
- •Implement circuit breaking, retries, and timeouts consistently across services
- •Need distributed tracing and service dependency mapping
Inputs
- •Required: Kubernetes cluster with admin access
- •Required: Choice of service mesh (Istio or Linkerd)
- •Required: Namespace(s) to enable service mesh
- •Optional: Monitoring stack (Prometheus, Grafana, Jaeger)
- •Optional: Custom traffic management requirements
- •Optional: Certificate authority configuration for mTLS
Procedure
Step 1: Install Service Mesh Control Plane
Choose and install the service mesh control plane.
For Istio:
# Download Istio curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.20.2 sh - cd istio-1.20.2 export PATH=$PWD/bin:$PATH # Install with demo profile (or production profile for production) istioctl install --set profile=production -y # Verify installation kubectl get pods -n istio-system kubectl get svc -n istio-system
For Linkerd:
# Install Linkerd CLI curl -sL https://run.linkerd.io/install | sh export PATH=$PATH:$HOME/.linkerd2/bin # Verify cluster compatibility linkerd check --pre # Install Linkerd control plane with HA linkerd install --ha | kubectl apply -f - # Verify installation linkerd check kubectl get pods -n linkerd
Create a service mesh configuration file:
# service-mesh-config.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
name: production-istio
namespace: istio-system
spec:
profile: production
meshConfig:
accessLogFile: /dev/stdout
enableTracing: true
defaultConfig:
tracing:
sampling: 100
zipkin:
address: jaeger-collector.observability:9411
components:
pilot:
k8s:
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
cpu: 2000m
memory: 4Gi
ingressGateways:
- name: istio-ingressgateway
enabled: true
k8s:
resources:
requests:
cpu: 200m
memory: 256Mi
service:
type: LoadBalancer
Expected: Control plane pods running in istio-system (Istio) or linkerd (Linkerd) namespace. istioctl version or linkerd version shows matching client and server versions.
On failure:
- •Check cluster has sufficient resources (at least 4 CPU cores, 8GB RAM for production)
- •Verify Kubernetes version compatibility (check mesh documentation)
- •Review logs:
kubectl logs -n istio-system -l app=istiodorkubectl logs -n linkerd -l linkerd.io/control-plane-component=controller - •Check for conflicting CRDs:
kubectl get crd | grep istioorkubectl get crd | grep linkerd
Step 2: Enable Automatic Sidecar Injection
Configure namespaces for automatic sidecar proxy injection.
For Istio:
# Label namespace for automatic injection
kubectl label namespace default istio-injection=enabled
kubectl label namespace production istio-injection=enabled
# Verify label
kubectl get namespace -L istio-injection
# Create injection configuration for specific workloads
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: application
labels:
istio-injection: enabled
---
apiVersion: v1
kind: ConfigMap
metadata:
name: istio-sidecar-injector
namespace: istio-system
data:
values: |
global:
proxy:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
EOF
For Linkerd:
# Annotate namespace for injection kubectl annotate namespace default linkerd.io/inject=enabled kubectl annotate namespace production linkerd.io/inject=enabled # For specific deployments, annotate the pod template kubectl get deploy -n default -o yaml | \ linkerd inject - | \ kubectl apply -f -
Test sidecar injection with a sample deployment:
# test-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-app
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: test-app
template:
metadata:
labels:
app: test-app
spec:
containers:
- name: app
image: nginx:alpine
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: test-app
namespace: default
spec:
selector:
app: test-app
ports:
- port: 80
targetPort: 80
kubectl apply -f test-deployment.yaml kubectl get pods -n default kubectl describe pod -n default -l app=test-app
Expected: New pods show 2/2 containers (application + sidecar proxy). Describe output shows istio-proxy or linkerd-proxy container. Logs show successful proxy startup.
On failure:
- •Check namespace labels/annotations:
kubectl get ns default -o yaml - •Verify mutating webhook is active:
kubectl get mutatingwebhookconfiguration - •Review injection logs:
kubectl logs -n istio-system -l app=sidecar-injector(Istio) - •Manually inject to test:
kubectl get deploy test-app -o yaml | istioctl kube-inject -f - | kubectl apply -f -
Step 3: Configure mTLS Policy
Enable mutual TLS for secure service-to-service communication.
For Istio:
# mtls-policy.yaml
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT
---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default-production
namespace: production
spec:
mtls:
mode: STRICT
---
# Allow specific workloads to use PERMISSIVE for migration
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: legacy-service
namespace: production
spec:
selector:
matchLabels:
app: legacy-api
mtls:
mode: PERMISSIVE
---
# Destination rule for TLS origination
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: global-mtls
namespace: istio-system
spec:
host: "*.local"
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
For Linkerd:
# Linkerd enforces mTLS by default for meshed pods
# Verify mTLS is working
linkerd viz tap deploy/test-app -n default
# Check for 🔒 (lock) symbol indicating encrypted connections
linkerd viz stat deploy -n default
# Create ServerAuthorization for fine-grained access control
cat <<EOF | kubectl apply -f -
apiVersion: policy.linkerd.io/v1beta1
kind: Server
metadata:
name: test-app-server
namespace: default
spec:
podSelector:
matchLabels:
app: test-app
port: 80
---
apiVersion: policy.linkerd.io/v1alpha1
kind: ServerAuthorization
metadata:
name: test-app-authz
namespace: default
spec:
server:
name: test-app-server
client:
meshTLS:
serviceAccounts:
- name: default
namespace: production
EOF
Apply and verify:
kubectl apply -f mtls-policy.yaml
# For Istio, verify mTLS status
istioctl authn tls-check $(kubectl get pod -n default -l app=test-app -o jsonpath='{.items[0].metadata.name}') -n default
# Test connectivity
kubectl run curl-test --image=curlimages/curl:latest --rm -it --restart=Never -- \
curl -v http://test-app.default.svc.cluster.local
Expected: All connections between meshed services show mTLS enabled. Istio tls-check shows STATUS as "OK". Linkerd tap output shows 🔒 for all connections. Service logs show no TLS errors.
On failure:
- •Check certificate issuance:
kubectl get certificates -A(cert-manager) - •Verify CA is healthy:
kubectl logs -n istio-system -l app=istiod | grep -i cert - •Test with PERMISSIVE mode first, then transition to STRICT
- •Check for services without sidecars:
kubectl get pods --all-namespaces -o json | jq '.items[] | select(.spec.containers | length == 1) | .metadata.name'
Step 4: Implement Traffic Management Rules
Configure intelligent traffic routing, retries, and circuit breaking.
Create traffic management policies:
# traffic-management.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: api-routes
namespace: production
spec:
hosts:
- api.example.com
gateways:
- istio-gateway
http:
# Canary deployment: 90% v1, 10% v2
- match:
- uri:
prefix: /api/v2
route:
- destination:
host: api-service
subset: v2
weight: 10
- destination:
host: api-service
subset: v1
weight: 90
retries:
attempts: 3
perTryTimeout: 2s
retryOn: 5xx,reset,connect-failure
timeout: 10s
# A/B testing based on headers
- match:
- headers:
x-beta-user:
exact: "true"
route:
- destination:
host: api-service
subset: v2
- route:
- destination:
host: api-service
subset: v1
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: api-circuit-breaker
namespace: production
spec:
host: api-service
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
http2MaxRequests: 100
maxRequestsPerConnection: 2
outlierDetection:
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
minHealthPercent: 40
---
# Gateway for external traffic
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: istio-gateway
namespace: production
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
credentialName: api-tls-cert
hosts:
- api.example.com
For Linkerd traffic splitting:
# linkerd-traffic-split.yaml
apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
name: api-rollout
namespace: production
spec:
service: api-service
backends:
- service: api-service-v1
weight: 900
- service: api-service-v2
weight: 100
Apply and test:
kubectl apply -f traffic-management.yaml
# Test traffic distribution
for i in {1..100}; do
curl -s http://api.example.com/api/v2 | grep version
done | sort | uniq -c
# Monitor traffic in real-time
istioctl dashboard kiali
# OR for Linkerd:
linkerd viz dashboard
Expected: Traffic splits according to defined weights. Circuit breaker trips after consecutive errors. Retries occur for transient failures. Kiali/Linkerd dashboard shows traffic flow visualization.
On failure:
- •Verify destination hosts resolve:
kubectl get svc -n production - •Check subset labels match pod labels:
kubectl get pods -n production --show-labels - •Review pilot logs:
kubectl logs -n istio-system -l app=istiod - •Test without circuit breaker first, then add incrementally
- •Use
istioctl analyzeto check configuration:istioctl analyze -n production
Step 5: Integrate Observability Stack
Connect service mesh telemetry to monitoring and tracing systems.
Install observability addons:
# For Istio (includes Prometheus, Grafana, Kiali, Jaeger) kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/prometheus.yaml kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/grafana.yaml kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/kiali.yaml kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/jaeger.yaml # For Linkerd linkerd viz install | kubectl apply -f - linkerd jaeger install | kubectl apply -f -
Configure custom metrics and dashboards:
# service-monitor.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: istio-grafana-dashboards
namespace: istio-system
labels:
grafana_dashboard: "1"
data:
istio-mesh-dashboard.json: |
{
"dashboard": {
"title": "Istio Mesh Dashboard",
"panels": [
{
"title": "Request Rate",
"targets": [
{
"expr": "sum(rate(istio_requests_total[5m])) by (destination_service_name)"
}
]
},
{
"title": "Success Rate",
"targets": [
{
"expr": "sum(rate(istio_requests_total{response_code!~\"5.*\"}[5m])) / sum(rate(istio_requests_total[5m])) * 100"
}
]
}
]
}
}
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: istio-mesh-metrics
namespace: istio-system
spec:
selector:
matchLabels:
app: istiod
endpoints:
- port: http-monitoring
interval: 30s
path: /metrics
Create custom telemetry for business metrics:
# telemetry.yaml
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
name: custom-metrics
namespace: production
spec:
metrics:
- providers:
- name: prometheus
dimensions:
request_path: request.path
response_code: response.code
overrides:
- match:
metric: REQUEST_COUNT
tagOverrides:
destination_port:
operation: REMOVE
Access dashboards:
# Istio dashboards istioctl dashboard grafana istioctl dashboard kiali istioctl dashboard jaeger # Linkerd dashboards linkerd viz dashboard linkerd jaeger dashboard # Port-forward for external access kubectl port-forward -n istio-system svc/grafana 3000:3000 kubectl port-forward -n istio-system svc/kiali 20001:20001
Expected: Dashboards show service topology, request rates, latency percentiles, error rates. Distributed traces available in Jaeger. Prometheus scraping mesh metrics successfully. Custom metrics appear in queries.
On failure:
- •Verify Prometheus scraping:
kubectl get servicemonitor -A - •Check addon pods are running:
kubectl get pods -n istio-system - •Review telemetry configuration:
istioctl proxy-config log <pod-name> -n <namespace> - •Verify mesh config has telemetry enabled:
kubectl get configmap istio -n istio-system -o yaml | grep -A 5 enableTracing - •Check for port conflicts if port-forward fails
Step 6: Validate and Monitor Mesh Health
Perform comprehensive health checks and set up ongoing monitoring.
# Istio validation
istioctl analyze --all-namespaces
istioctl verify-install
istioctl proxy-status
# Linkerd validation
linkerd check
linkerd viz check
linkerd diagnostics policy
# Check proxy sync status
kubectl get pods -n production -o json | \
jq '.items[] | {name: .metadata.name, proxy: .status.containerStatuses[] | select(.name=="istio-proxy").ready}'
# Test end-to-end connectivity with mTLS
kubectl exec -it deploy/test-app -n default -- \
curl -v https://api-service.production.svc.cluster.local
# Monitor control plane health
kubectl get pods -n istio-system -w
kubectl top pods -n istio-system
# Set up alerts for mesh issues
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: istio-alerts
namespace: istio-system
data:
alerts.yaml: |
groups:
- name: istio
interval: 30s
rules:
- alert: HighSidecarMemory
expr: container_memory_usage_bytes{container="istio-proxy"} > 500000000
for: 5m
annotations:
summary: "High sidecar memory usage"
- alert: MeshConfigSyncFailed
expr: pilot_proxy_convergence_time{job="istiod"} > 30
for: 5m
annotations:
summary: "Mesh configuration sync taking too long"
EOF
Create health check script:
#!/bin/bash
# mesh-health-check.sh
echo "=== Service Mesh Health Check ==="
echo "1. Control Plane Status:"
kubectl get pods -n istio-system
echo "2. Proxy Injection Status:"
kubectl get namespace -L istio-injection
echo "3. Configuration Analysis:"
istioctl analyze --all-namespaces
echo "4. mTLS Status:"
istioctl authn tls-check $(kubectl get pod -n production -l app=api-service -o jsonpath='{.items[0].metadata.name}').production
echo "5. Traffic Metrics (last 5 min):"
kubectl exec -n istio-system deploy/istiod -- \
curl -s localhost:15014/metrics | grep istio_requests_total | tail -5
echo "=== Health Check Complete ==="
Expected: All analysis checks pass with no warnings. Proxy-status shows all proxies synced. mTLS check confirms encryption. Metrics show traffic flowing. Control plane pods stable with low resource usage.
On failure:
- •Address specific issues from
istioctl analyzeoutput - •Check proxy logs for individual pods:
kubectl logs <pod> -c istio-proxy -n <namespace> - •Verify network policies aren't blocking mesh traffic
- •Review control plane logs for errors:
kubectl logs -n istio-system deploy/istiod --tail=100 - •Restart problematic proxies:
kubectl rollout restart deploy/<deployment> -n <namespace>
Validation
- • Control plane pods running and healthy (istiod/linkerd-controller)
- • Sidecar proxies injected into all application pods (2/2 containers)
- • mTLS enabled and functioning (verified with tls-check/tap)
- • Traffic management rules routing requests correctly (verified with curl tests)
- • Circuit breaker trips on repeated failures (tested with fault injection)
- • Observability dashboards showing metrics (Grafana/Kiali/Linkerd Viz)
- • Distributed traces captured in Jaeger for sample requests
- • No configuration warnings from istioctl analyze/linkerd check
- • Proxy sync status shows all proxies in sync
- • Service-to-service communication encrypted (verified in logs/dashboards)
Common Pitfalls
- •
Resource Exhaustion: Service mesh adds 100-200MB memory per pod for sidecars. Ensure cluster has sufficient capacity. Set appropriate resource limits in injection config.
- •
Configuration Conflicts: Multiple VirtualServices for same host cause undefined behavior. Use single VirtualService per host with multiple match conditions instead.
- •
Certificate Expiration: mTLS certificates auto-rotate but CA root must be managed. Monitor certificate expiry with:
kubectl get certificate -Aand set up alerts. - •
Sidecar Not Injected: Pods created before namespace labeling won't have sidecars. Must recreate:
kubectl rollout restart deploy/<name> -n <namespace>. - •
DNS Resolution Issues: Service mesh intercepts DNS. Use fully qualified names (service.namespace.svc.cluster.local) for cross-namespace calls.
- •
Port Naming Requirement: Istio requires named ports following protocol-name pattern (e.g., http-web, tcp-db). Unnamed ports default to TCP passthrough.
- •
Gradual Rollout Required: Don't enable STRICT mTLS immediately in production. Use PERMISSIVE mode during migration, verify all services meshed, then switch to STRICT.
- •
Observability Overhead: 100% tracing sampling causes performance issues. Use 1-10% for production:
sampling: 1.0in mesh config. - •
Gateway vs VirtualService Confusion: Gateway configures ingress (load balancer), VirtualService configures routing. Both required for external traffic.
- •
Version Compatibility: Ensure mesh version compatible with Kubernetes version. Istio supports n-1 minor versions, Linkerd typically supports last 3 Kubernetes versions.
Related Skills
- •
configure-ingress-networking- Gateway configuration complements mesh ingress - •
deploy-to-kubernetes- Application deployment patterns that work with service mesh - •
setup-prometheus-monitoring- Prometheus integration for mesh metrics - •
manage-kubernetes-secrets- Certificate management for mTLS - •
enforce-policy-as-code- OPA policies that work alongside mesh authorization