Monitoring Stack Deployer

This skill provides automated assistance for monitoring stack deployer tasks.

Overview

Deploys monitoring stacks (Prometheus/Grafana/Datadog) including collectors, scraping config, dashboards, and alerting rules for production systems.

Prerequisites

Before using this skill, ensure:

•Target infrastructure is identified (Kubernetes, Docker, bare metal)
•Metric endpoints are accessible from monitoring platform
•Storage backend is configured for time-series data
•Alert notification channels are defined (email, Slack, PagerDuty)
•Resource requirements are calculated based on scale

Instructions

•Select Platform: Choose Prometheus/Grafana, Datadog, or hybrid approach
•Deploy Collectors: Install exporters and agents on monitored systems
•Configure Scraping: Define metric collection endpoints and intervals
•Set Up Storage: Configure retention policies and data compaction
•Create Dashboards: Build visualization panels for key metrics
•Define Alerts: Create alerting rules with appropriate thresholds
•Test Monitoring: Verify metrics flow and alert triggering

Output

Prometheus + Grafana (Kubernetes):

yaml

# {baseDir}/monitoring/prometheus.yaml


## Overview

This skill provides automated assistance for the described functionality.

## Examples

Example usage patterns will be demonstrated in context.
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    scrape_configs:
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus:latest
        args:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.retention.time=30d'
        ports:
        - containerPort: 9090

Grafana Dashboard Configuration:

json

{
  "dashboard": {
    "title": "Application Metrics",
    "panels": [
      {
        "title": "CPU Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(container_cpu_usage_seconds_total[5m])"
          }
        ]
      }
    ]
  }
}

Error Handling

Metrics Not Appearing

•Error: "No data points"
•Solution: Verify scrape targets are accessible and returning metrics

High Cardinality

•Error: "Too many time series"
•Solution: Reduce label combinations or increase Prometheus resources

Alert Not Firing

•Error: "Alert condition met but no notification"
•Solution: Check Alertmanager configuration and notification channels

Dashboard Load Failure

•Error: "Failed to load dashboard"
•Solution: Verify Grafana datasource configuration and permissions

Examples

•"Deploy Prometheus + Grafana on Kubernetes and add alerts for high error rate and latency."
•"Install Datadog agents across hosts and configure a dashboard for CPU/memory saturation."

Resources

•Prometheus documentation: https://prometheus.io/docs/
•Grafana documentation: https://grafana.com/docs/
•Example dashboards in {baseDir}/monitoring-examples/