OpenTelemetry Instrumentation Extension
Automatically extend OpenTelemetry instrumentation for new functionality in the MCP Gateway, following established patterns documented in docs/telemetry/README.md.
When to Use This Skill
Automatically apply when:
- •New state-changing operations added (Create, Update, Delete, Push, Pull, Add, Remove, etc.)
- •New CLI commands added to
cmd/docker-mcp/ - •New packages with operations in
pkg/ - •User mentions "otel", "telemetry", "instrumentation", "metrics", or "tracing"
- •Code changes that modify state (database, files, containers, configuration)
- •Reviewing code for telemetry coverage
Workflow
Phase 1: Analysis & Suggestion
- •
Read project telemetry standards:
- •Read
docs/telemetry/README.md"Development Guidelines" section - •Read
pkg/telemetry/telemetry.goto understand existing patterns and metrics
- •Read
- •
Identify scope using git diff:
- •Find new/changed files in
pkg/andcmd/docker-mcp/ - •Identify functions performing state-changing operations
- •Infer domain from package structure (e.g.,
pkg/foo/→ domain:foo)
- •Find new/changed files in
- •
Categorize findings:
- •Operations in existing domains (use existing metrics)
- •Operations in new domains (need new metrics)
- •
Present suggestions to user:
- •List each function needing instrumentation with file:line reference
- •Specify operation type (create, update, delete, etc.)
- •Identify domain and whether metrics exist
- •Show what will be added (instrumentation code, metrics, docs, tests)
Phase 2: Implementation (After User Approval)
Execute in this order:
- •Make changes: Instrument operations and add metrics to
pkg/telemetry/telemetry.go - •Verify: Build, run with OTEL collector, check
docker logs otel-debugoutput - •Write tests: Add tests to
pkg/telemetry/telemetry_test.go - •Update docs: Add metrics/operations to
docs/telemetry/README.md - •Run tests: Execute
make test - •Final verification: Run
./docs/telemetry/testing/test-telemetry.sh
Key Principles
Follow the project's documented guidelines from docs/telemetry/README.md:
- •Use Existing Providers: Get global tracer/meter from OTEL
- •Preserve Server Lineage: Include server attribution in all telemetry
- •Non-Blocking Operations: Telemetry never blocks or fails operations
- •Debug Support: Add logging behind
DOCKER_MCP_TELEMETRY_DEBUG - •Follow Naming Conventions: Use
mcp.<domain>.<field>pattern - •Deferred Success Tracking: Only set success on clean completion
Where to Find Information
ALWAYS read these files before suggesting changes:
- •
docs/telemetry/README.md- Source of truth for:- •Development guidelines (lines 419-474)
- •Existing metrics and attributes
- •Testing procedures
- •Naming conventions
- •
pkg/telemetry/telemetry.go- Understand:- •Existing metric instruments
- •Recording function patterns
- •Helper functions for spans
- •Init() structure
- •
pkg/telemetry/telemetry_test.go- See:- •Testing patterns
- •How to verify metrics
Instrumentation Pattern
Use the simple defer pattern from cmd/docker-mcp/catalog/create.go:
func OperationName(ctx context.Context, identifier string, ...) error {
telemetry.Init()
start := time.Now()
var success bool
defer func() {
duration := time.Since(start)
telemetry.Record<Domain>Operation(ctx, "operation_name", identifier,
float64(duration.Milliseconds()), success)
}()
// ... operation logic ...
// Optional: Record resource counts
telemetry.Record<Domain><Resources>(ctx, identifier, int64(count))
success = true
return nil
}
Note: If identifier is generated during execution, the defer captures its final value.
Adding New Domains
When instrumentation is needed for a new domain (e.g., new package pkg/newdomain/):
1. Add Metric Instruments in pkg/telemetry/telemetry.go
In the Init() function, add global variables and create metric instruments:
var (
// ... existing metrics ...
// New domain metrics
newdomainOperations metric.Int64Counter
newdomainOperationDuration metric.Float64Histogram
newdomainResources metric.Int64Gauge // If managing resources
)
func Init() {
// ... existing init code ...
newdomainOperations, _ = meter.Int64Counter(
"mcp.newdomain.operations",
metric.WithDescription("New domain operations count"),
)
newdomainOperationDuration, _ = meter.Float64Histogram(
"mcp.newdomain.operation.duration",
metric.WithDescription("New domain operation duration in milliseconds"),
)
newdomainResources, _ = meter.Int64Gauge(
"mcp.newdomain.resources",
metric.WithDescription("Number of resources in new domain"),
)
}
2. Add Recording Functions in pkg/telemetry/telemetry.go
After the Init() function, add recording functions:
func RecordNewdomainOperation(ctx context.Context, operation, identifier string, durationMs float64, success bool) {
if newdomainOperations == nil || newdomainOperationDuration == nil {
return
}
attrs := []attribute.KeyValue{
attribute.String("mcp.newdomain.operation", operation),
attribute.String("mcp.newdomain.id", identifier), // or .name, .ref as appropriate
attribute.Bool("mcp.newdomain.success", success),
}
newdomainOperations.Add(ctx, 1, metric.WithAttributes(attrs...))
newdomainOperationDuration.Record(ctx, durationMs, metric.WithAttributes(attrs...))
}
// Optional: Add resource counting function if applicable
func RecordNewdomainResources(ctx context.Context, identifier string, count int64) {
if newdomainResources == nil {
return
}
attrs := []attribute.KeyValue{
attribute.String("mcp.newdomain.id", identifier),
}
newdomainResources.Record(ctx, count, metric.WithAttributes(attrs...))
}
3. Write Tests in pkg/telemetry/telemetry_test.go
Add test cases following existing patterns:
func TestRecordNewdomainOperation(t *testing.T) {
spanRecorder, metricReader := setupTestTelemetry(t)
Init()
ctx := context.Background()
// Test successful operation
RecordNewdomainOperation(ctx, "create", "test-id", 123.45, true)
// Collect and verify metrics
var rm metricdata.ResourceMetrics
err := metricReader.Collect(ctx, &rm)
require.NoError(t, err)
// Find and verify the counter metric
foundCounter := false
foundHistogram := false
for _, sm := range rm.ScopeMetrics {
for _, m := range sm.Metrics {
if m.Name == "mcp.newdomain.operations" {
foundCounter = true
sum := m.Data.(metricdata.Sum[int64])
require.Len(t, sum.DataPoints, 1)
assert.Equal(t, int64(1), sum.DataPoints[0].Value)
// Verify attributes
attrs := sum.DataPoints[0].Attributes
assert.Contains(t, attrs.ToSlice(), attribute.String("mcp.newdomain.operation", "create"))
assert.Contains(t, attrs.ToSlice(), attribute.String("mcp.newdomain.id", "test-id"))
assert.Contains(t, attrs.ToSlice(), attribute.Bool("mcp.newdomain.success", true))
}
if m.Name == "mcp.newdomain.operation.duration" {
foundHistogram = true
histogram := m.Data.(metricdata.Histogram[float64])
require.Len(t, histogram.DataPoints, 1)
assert.Equal(t, float64(123.45), histogram.DataPoints[0].Sum)
}
}
}
assert.True(t, foundCounter, "Counter metric not found")
assert.True(t, foundHistogram, "Histogram metric not found")
}
4. Update Documentation in docs/telemetry/README.md
Add a new section for the domain in the appropriate location. Follow the existing format:
### New Domain Operations Operations for managing [description of what this domain does]: - **`mcp.newdomain.operations`** - New domain operations (create, update, delete, etc.) - **`mcp.newdomain.operation.duration`** - Duration of new domain operations - **`mcp.newdomain.resources`** - Gauge showing number of resources in new domain #### New Domain Attributes - **`mcp.newdomain.operation`** - Type of operation (create, update, delete, etc.) - **`mcp.newdomain.id`** - ID of the resource - **`mcp.newdomain.success`** - Boolean indicating operation success
Verification
After making changes, verify telemetry output:
# Build make docker-mcp # Start OTEL collector docker run --rm -d --name otel-debug \ -p 4317:4317 -p 4318:4318 \ -v $(pwd)/docs/telemetry/testing/otel-collector-config.yaml:/config.yaml \ otel/opentelemetry-collector:latest --config=/config.yaml # Run with telemetry enabled export DOCKER_MCP_TELEMETRY_DEBUG=1 export DOCKER_CLI_OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 docker mcp [command] # Check collector output docker logs otel-debug | grep "mcp.newdomain" # Cleanup docker stop otel-debug
Verify the output matches expectations before proceeding to write tests and docs.
Analysis Strategy
When analyzing git diff:
- •
Look for operation verbs in function names:
- •Create, Add, Update, Modify, Set, Configure, Register
- •Delete, Remove, Unregister, Clear
- •Push, Pull, Export, Import, Sync
- •Start, Stop, Run, Execute
- •
Look for state changes:
- •Database operations (DAO/DB calls)
- •File system operations (create, delete, write)
- •Container operations (start, stop, create, delete)
- •Configuration changes (save, update)
- •
Infer domain from file path:
- •
pkg/workingset/→ domain:profile - •
pkg/catalog_next/→ domain:catalog_next - •
cmd/docker-mcp/server/→ domain:server - •Pattern: use logical grouping name
- •
- •
Check if telemetry exists:
- •Search
pkg/telemetry/telemetry.goforRecord<Domain>functions - •If exists: use existing metrics
- •If not: propose new domain metrics
- •Search
Implementation Checklist
Follow this sequence:
- • Read patterns from
docs/telemetry/README.md - • Instrument operations with simple defer pattern
- • Add new metrics to
pkg/telemetry/telemetry.go(if new domain) - • Verify with
docker logs otel-debug- confirm output matches expectations - • Write tests in
pkg/telemetry/telemetry_test.go - • Update
docs/telemetry/README.mdwith new metrics - • Run
make test- verify tests pass - • Run
./docs/telemetry/testing/test-telemetry.sh- final verification
Important Notes
- •Read documentation first
- •Follow existing patterns
- •Ask for approval before implementing
- •Verify early and often with collector logs