AgentSkillsCN

LAYER_11_APM

文档机器人APM(可观测性)层建模的专业知识

SKILL.md
--- frontmatter
name: LAYER_11_APM
description: Expert knowledge for APM (Observability) Layer modeling in Documentation Robotics
triggers:
  [
    "APM",
    "observability",
    "monitoring",
    "metrics",
    "spans",
    "traces",
    "logs",
    "OpenTelemetry",
    "telemetry",
  ]
version: 0.7.0

APM Layer Skill

Layer Number: 11 Specification: Metadata Model Spec v0.7.0 Purpose: Defines observability using OpenTelemetry 1.0+, specifying traces, metrics, logs, and instrumentation.


Layer Overview

The APM Layer captures application performance monitoring:

  • TRACES - Distributed tracing with spans
  • METRICS - Performance and business metrics
  • LOGS - Structured logging
  • INSTRUMENTATION - Code instrumentation configuration
  • RESOURCES - Telemetry resource attributes
  • SLOs/SLAs - Service level objectives and agreements

This layer uses OpenTelemetry 1.0+ (industry standard for observability).

Central Entity: The Span (unit of work in trace) is the core modeling unit.


Entity Types

Core APM Entities (14 entities)

Entity TypeDescription
APMConfigurationRoot configuration for observability
TraceConfigurationDistributed tracing configuration
SpanUnit of work in distributed trace
SpanEventTimestamped event within span
MetricConfigurationMetrics collection configuration
MeterConfigMeter for metric instruments
InstrumentConfigMetric instrument (counter, gauge, histogram)
LogConfigurationStructured logging configuration
LogRecordIndividual log record
ResourceTelemetry resource attributes
AttributeKey-value telemetry attribute
InstrumentationScopeScope of instrumentation
DataQualityMetricData quality metrics
DataQualityMetricsCollection of data quality metrics

When to Use This Skill

Activate when the user:

  • Mentions "APM", "observability", "monitoring", "telemetry"
  • Wants to add tracing, metrics, or logging
  • Asks about performance monitoring or SLOs
  • Needs to instrument code or track business metrics
  • Wants to link observability to application services

Cross-Layer Relationships

Outgoing (APM → Other Layers):

  • instrumented-service → Application Layer (which service is being monitored?)
  • business-metrics → Business Layer (business KPIs)

Incoming (Other Layers → APM):

  • Application Layer → APM (services reference metrics)
  • API Layer → APM (operations set SLA targets)
  • Business Layer → APM (processes reference business metrics)
  • Data Model Layer → APM (data quality metrics)
  • Datastore Layer → APM (query performance metrics)

Observability Best Practices

  1. Traces - Add spans for critical operations
  2. Metrics - Track both technical and business metrics
  3. Logs - Use structured logging (JSON)
  4. Context - Propagate trace context across services
  5. Sampling - Configure appropriate sampling rates
  6. SLOs - Define service level objectives
  7. Alerting - Set up alerts for critical metrics
  8. Cardinality - Avoid high-cardinality attributes

Common Commands

bash
# Add span
dr add apm span --name "Process Order"

# Add metric
dr add apm instrument-config --name "order_rate" --property type=counter

# Add log record
dr add apm log-record --name "Error Log"

# List APM elements
dr list apm span

# Validate APM layer
dr validate --layer apm

# Export APM configuration
dr export --layer apm --format json

Example: Order Processing Span

yaml
id: apm.span.process-order
name: "Process Order Span"
type: span
properties:
  spanKind: INTERNAL
  operationName: process_order
  attributes:
    - key: order.id
      type: string
    - key: order.total
      type: double
    - key: customer.id
      type: string
  events:
    - name: order_validated
      timestamp: relative
    - name: payment_processed
      timestamp: relative
    - name: inventory_reserved
      timestamp: relative
  instrumented-service: application.service.order-processing
  business-metrics:
    - apm.metric.orders-processed
    - apm.metric.order-processing-time
  sla-target-latency: 500ms
  sla-target-availability: 99.9%

Example: Business Metrics

yaml
id: apm.metric.order-rate
name: "Order Rate Metric"
type: instrument-config
properties:
  instrumentType: counter
  unit: orders
  description: "Number of orders processed"
  attributes:
    - key: order.type
      type: string
      values: [standard, express, subscription]
    - key: payment.method
      type: string
      values: [credit_card, paypal, bank_transfer]
  business-context:
    kpi: true
    dashboard: revenue-dashboard
    alerts:
      - threshold: 100
        condition: less-than
        severity: warning
        message: "Order rate below threshold"

Example: Data Quality Metrics

yaml
id: apm.data-quality-metrics.user-data
name: "User Data Quality Metrics"
type: data-quality-metrics
properties:
  entity: data_model.object-schema.user
  metrics:
    - name: completeness
      type: gauge
      description: "Percentage of non-null required fields"
      target: 95%
    - name: validity
      type: gauge
      description: "Percentage of records passing validation"
      target: 99%
    - name: uniqueness
      type: gauge
      description: "Percentage of unique email addresses"
      target: 100%
    - name: freshness
      type: histogram
      description: "Time since last update"
      target: 24h

Pitfalls to Avoid

  • ❌ Not propagating trace context across services
  • ❌ High-cardinality attributes (e.g., unique IDs in tags)
  • ❌ Missing business metrics (only technical metrics)
  • ❌ Not setting SLOs/SLAs
  • ❌ Over-instrumentation (too many spans)
  • ❌ Missing cross-layer links to instrumented services