AgentSkillsCN

opentelemetry

实现OpenTelemetry(OTEL)可观测性——Collector配置、Kubernetes部署、追踪/指标/日志管道、仪器化和故障排除。在处理OTEL Collector、遥测管道、可观测性基础设施或Kubernetes监控时使用。

SKILL.md
--- frontmatter
name: opentelemetry
description: Implement OpenTelemetry (OTEL) observability - Collector configuration, Kubernetes deployment, traces/metrics/logs pipelines, instrumentation, and troubleshooting. Use when working with OTEL Collector, telemetry pipelines, observability infrastructure, or Kubernetes monitoring.

OpenTelemetry Implementation Guide

Overview

OpenTelemetry (OTel) is a vendor-neutral observability framework for instrumenting, generating, collecting, and exporting telemetry data (traces, metrics, logs). This skill provides guidance for implementing OTEL in Kubernetes environments.

Quick Start

Deploy OTEL Collector on Kubernetes

bash
# Add Helm repo
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

# Install with basic config
helm install otel-collector open-telemetry/opentelemetry-collector \
  --namespace monitoring --create-namespace \
  --set mode=daemonset

Send Test Data via OTLP

bash
# gRPC endpoint: 4317, HTTP endpoint: 4318
curl -X POST http://otel-collector:4318/v1/traces \
  -H "Content-Type: application/json" \
  -d '{"resourceSpans":[]}'

Core Concepts

Signals: Three types of telemetry data:

  • Traces: Distributed request flows across services
  • Metrics: Numerical measurements (counters, gauges, histograms)
  • Logs: Event records with structured/unstructured data

Collector Components:

  • Receivers: Accept data (OTLP, Prometheus, Jaeger, Zipkin)
  • Processors: Transform data (batch, memory_limiter, k8sattributes)
  • Exporters: Send data (prometheusremotewrite, loki, otlp)
  • Extensions: Add capabilities (health_check, pprof, zpages)

Collector Configuration

Basic Pipeline Structure

yaml
config:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: ${env:MY_POD_IP}:4317
        http:
          endpoint: ${env:MY_POD_IP}:4318

  processors:
    batch:
      timeout: 10s
      send_batch_size: 1024
    memory_limiter:
      check_interval: 5s
      limit_percentage: 80
      spike_limit_percentage: 25

  exporters:
    prometheusremotewrite:
      endpoint: "http://prometheus:9090/api/v1/write"
    loki:
      endpoint: "http://loki:3100/loki/api/v1/push"

  service:
    pipelines:
      metrics:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [prometheusremotewrite]
      logs:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [loki]
      traces:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [otlp/tempo]

Kubernetes Attributes Enrichment

yaml
processors:
  k8sattributes:
    auth_type: "serviceAccount"
    passthrough: false
    filter:
      node_from_env_var: ${env:K8S_NODE_NAME}
    extract:
      metadata:
        - k8s.pod.name
        - k8s.namespace.name
        - k8s.deployment.name
        - k8s.node.name

Deployment Modes

ModeUse CaseProsCons
DaemonSetNode-level collectionFull coverage, host metricsHigher resource usage
DeploymentCentralized gatewayScalable, easier managementSingle point of failure
SidecarPer-pod collectionIsolated, fine-grainedResource overhead per pod

Common Patterns

Development Environment

  • Enable debug exporter for visibility
  • Lower resource limits (250m CPU, 512Mi memory)
  • Include spot instance tolerations for cost savings

Production Environment

  • Implement sampling (10-50% for traces)
  • Higher batch sizes (2048-4096)
  • Enable autoscaling and PodDisruptionBudget
  • Use TLS for all endpoints

Detailed References

For in-depth guidance, see:

Validation Commands

bash
# Check collector pods
kubectl get pods -n monitoring -l app.kubernetes.io/name=otel-collector

# View collector logs
kubectl logs -n monitoring -l app.kubernetes.io/name=otel-collector --tail=100

# Test OTLP endpoint
kubectl run test-otlp --image=curlimages/curl:latest --rm -it -- \
  curl -v http://otel-collector.monitoring:4318/v1/traces

# Validate config syntax
otelcol validate --config=config.yaml

Key Helm Chart Values

yaml
mode: "daemonset"  # or "deployment"
presets:
  logsCollection:
    enabled: true
  hostMetrics:
    enabled: true
  kubernetesAttributes:
    enabled: true
  kubeletMetrics:
    enabled: true
useGOMEMLIMIT: true
resources:
  limits:
    cpu: 500m
    memory: 1Gi
  requests:
    cpu: 100m
    memory: 256Mi