AgentSkillsCN

deploy-otel

将 OpenTelemetry 可观测性堆栈(Prometheus、Grafana、OTEL Collector)部署至 Kind 集群,用于测试注册表服务器的遥测数据。当您需要搭建监控、指标收集或可观测性基础设施时,可使用此技能。

SKILL.md
--- frontmatter
name: deploy-otel
description: Deploy the OpenTelemetry observability stack (Prometheus, Grafana, OTEL Collector) to a Kind cluster for testing registry server telemetry. Use when you need to set up monitoring, metrics collection, or observability infrastructure.
allowed-tools: Bash, Read

Deploy OTEL Observability Stack

Deploy a complete OpenTelemetry observability stack to a Kind cluster for testing the registry server's telemetry capabilities.

Steps

1. Verify Prerequisites

Check that required tools are installed:

bash
echo "Checking prerequisites..."
command -v kind >/dev/null 2>&1 || { echo "ERROR: kind is not installed"; exit 1; }
command -v helm >/dev/null 2>&1 || { echo "ERROR: helm is not installed"; exit 1; }
command -v kubectl >/dev/null 2>&1 || { echo "ERROR: kubectl is not installed"; exit 1; }
echo "All prerequisites met."

2. Create Kind Cluster

Create the Kind cluster if it doesn't exist:

bash
CLUSTER_NAME="thv-registry"

if kind get clusters 2>/dev/null | grep -q "^${CLUSTER_NAME}$"; then
  echo "Kind cluster '${CLUSTER_NAME}' already exists"
else
  echo "Creating Kind cluster '${CLUSTER_NAME}'..."
  kind create cluster --name ${CLUSTER_NAME}
fi

# Export kubeconfig
kind get kubeconfig --name ${CLUSTER_NAME} > kconfig.yaml
echo "Kubeconfig written to kconfig.yaml"

3. Add Helm Repositories

bash
echo "Adding Helm repositories..."
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
echo "Helm repositories updated."

4. Install Prometheus/Grafana Stack

bash
echo "Installing kube-prometheus-stack..."
helm upgrade -i kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  -f examples/otel/prometheus-stack-values.yaml \
  -n monitoring --create-namespace \
  --kubeconfig kconfig.yaml \
  --wait --timeout 5m

echo "Prometheus/Grafana stack installed."

5. Install Tempo for Distributed Tracing

bash
echo "Installing Grafana Tempo..."
helm upgrade -i tempo grafana/tempo \
  -f examples/otel/tempo-values.yaml \
  -n monitoring \
  --kubeconfig kconfig.yaml \
  --wait --timeout 3m

echo "Grafana Tempo installed."

6. Install OpenTelemetry Collector

bash
echo "Installing OpenTelemetry Collector..."
helm upgrade -i otel-collector open-telemetry/opentelemetry-collector \
  -f examples/otel/otel-values.yaml \
  -n monitoring \
  --kubeconfig kconfig.yaml \
  --wait --timeout 3m

echo "OpenTelemetry Collector installed."

7. Verify Deployment

bash
echo "Verifying deployment..."
kubectl get pods -n monitoring --kubeconfig kconfig.yaml

8. Display Access Instructions

bash
cat <<'EOF'

=== OTEL Stack Deployment Complete ===

To access the UIs, run these port-forward commands:

  # Grafana (admin / admin)
  kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:3000 --kubeconfig kconfig.yaml

  # Prometheus
  kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090 --kubeconfig kconfig.yaml

To deploy the registry server with telemetry enabled:
  /deploy-registry-server-with-otel

EOF

Troubleshooting

If Helm installations fail due to incompatible values, it may be because the Helm charts have been updated and our values.yaml files are no longer compatible.

Chart Documentation:

If you encounter issues:

  1. Check the chart's values.yaml for schema changes in the versions of the Charts we are using
  2. Compare with our values files in examples/otel/
  3. Create an issue at: https://github.com/stacklok/toolhive-registry-server/issues describing what the issue is and recommend a fix

What This Deploys

ComponentDescription
PrometheusMetrics storage, scrapes OTEL collector on port 8889
GrafanaVisualization dashboards (admin/admin)
TempoDistributed tracing backend, receives traces from OTEL Collector
OTEL CollectorReceives OTLP metrics/traces, exports to Prometheus and Tempo

Cleanup

To remove everything:

bash
task kind-destroy

Or manually:

bash
kind delete cluster --name thv-registry
rm -f kconfig.yaml