AgentSkillsCN

capability-insights

利用 Insights 系统实现主动式问题检测。适用于实施基于 CEL 规则的 InsightPolicy 资源,自动识别配置错误、健康问题或安全风险时使用。

SKILL.md
--- frontmatter
name: capability-insights
description: Covers proactive issue detection using the Insights system. Use when implementing InsightPolicy resources with CEL-based rules to automatically detect misconfigurations, health issues, or security concerns.

Capability: Insights

This skill covers insights integration for Datum Cloud services using the Insights system.

Overview

The Insights system is a declarative, policy-driven, Kubernetes-native platform for proactively detecting issues in resources. It provides:

  • Policy-driven detection — CEL-based rules evaluate resource state to identify issues
  • Lifecycle management — Insights can be acknowledged, snoozed, assigned, and resolved
  • Muting capabilities — Suppress known/expected insights with mute rules
  • Multi-tenant scoping — Insights scoped to organization, project, or namespace

Key insight: Services don't detect issues programmatically. Instead, services define InsightPolicy resources with CEL expressions that describe what conditions warrant an insight.

API Group

All insights resources use the insights.miloapis.com API group with version v1alpha1.

Core Resource Types

The insights system has three resource types:

ResourceScopePurpose
InsightNamespacedA detected issue or finding about a resource
InsightPolicyNamespacedCEL-based rules that generate insights automatically
InsightMuteRuleNamespacedSuppress insights matching certain criteria

How Services Integrate

Services integrate by creating InsightPolicy resources that define rules for detecting issues. The insights system automatically:

  1. Watches resources matching the policy's target selector
  2. Evaluates CEL conditions against each resource
  3. Creates Insight resources when conditions match
  4. Resolves insights when conditions no longer match

What Services DO

  • Create InsightPolicy resources for their resource types
  • Define CEL-based conditions that identify issues
  • Write human-readable message templates using CEL expressions

What Services DON'T DO

  • Emit insights programmatically (the Insights controller creates them)
  • Manage insight lifecycle (users/automation handles acknowledgement, resolution)
  • Handle insight storage or retention

Quick Start: Creating an InsightPolicy

Step 1: Define Policy for Your Resource

yaml
apiVersion: insights.miloapis.com/v1alpha1
kind: InsightPolicy
metadata:
  name: myresource-config-issues
  namespace: myservice-system
spec:
  targetSelector:
    apiVersion: myservice.miloapis.com/v1alpha1
    kind: MyResource
  rules:
    - name: config-conflict
      condition: "object.spec.fieldA == 'value1' && object.spec.fieldB == 'incompatible'"
      severity: warning
      category: configuration
      message: "MyResource {{ object.metadata.name }} has conflicting configuration"
      description: "fieldA is set to 'value1' which is incompatible with fieldB 'incompatible'. Set fieldB to 'compatible' to resolve."

    - name: missing-required-field
      condition: "!has(object.spec.requiredField) || object.spec.requiredField == ''"
      severity: critical
      category: configuration
      message: "MyResource {{ object.metadata.name }} is missing required field"
      description: "The requiredField must be set for the resource to function correctly."

Step 2: Add to Kustomization

yaml
# config/insights/kustomization.yaml
resources:
  - myresource-config-issues.yaml

Step 3: Verify Policy is Active

bash
kubectl get insightpolicies -n myservice-system
kubectl get insights -A  # See generated insights

InsightPolicy Reference

Spec Fields

FieldTypeDescription
targetSelectorTargetSelectorWhich resources this policy applies to
rules[]InsightRuleRules that generate insights
suspendedboolStops generating new insights when true

TargetSelector

yaml
targetSelector:
  apiVersion: myservice.miloapis.com/v1alpha1
  kind: MyResource
  labelSelector:           # Optional: filter by labels
    matchLabels:
      environment: production
  namespaces:              # Optional: limit to specific namespaces
    - production
    - staging

InsightRule

FieldTypeDescription
namestringUnique identifier for the rule (lowercase, hyphenated)
conditionstringCEL expression that returns true when insight should exist
severityenuminfo, warning, or critical
categorystringClassification (e.g., configuration, security, performance)
messagestringShort summary with CEL template support {{ expr }}
descriptionstringDetailed explanation with CEL template support
ttlSecondsint64Optional time-to-live for generated insights

CEL Expression Context

In condition, message, and description expressions:

VariableDescription
objectThe resource being evaluated
object.metadataResource metadata (name, namespace, labels, etc.)
object.specResource spec
object.statusResource status

Insight Resource

Insights are created automatically by InsightPolicy rules or manually.

Insight Spec

yaml
apiVersion: insights.miloapis.com/v1alpha1
kind: Insight
metadata:
  name: insight-abc123
  namespace: my-project
spec:
  targetRef:
    apiVersion: myservice.miloapis.com/v1alpha1
    kind: MyResource
    name: my-resource
    namespace: my-project
  severity: warning
  category: configuration
  message: "MyResource my-resource has conflicting configuration"
  description: "fieldA is set to 'value1' which is incompatible..."
  source:
    type: Policy
    policyRef:
      name: myresource-config-issues
      namespace: myservice-system
      ruleName: config-conflict
  ttlSeconds: 0  # 0 = never expires

Insight Status

yaml
status:
  state: Active              # Active, Acknowledged, Snoozed, Resolved
  owner:                     # Who is responsible
    type: user
    name: alice@example.com
  acknowledgement:           # If acknowledged
    by: { type: user, name: alice@example.com }
    at: "2024-01-15T10:00:00Z"
    note: "Looking into this"
  snooze:                    # If snoozed
    by: { type: user, name: bob@example.com }
    at: "2024-01-15T10:00:00Z"
    until: "2024-01-16T10:00:00Z"
  assignment:                # If assigned
    by: { type: user, name: alice@example.com }
    to: { type: user, name: bob@example.com }
    at: "2024-01-15T10:00:00Z"
  resolution:                # If resolved
    by: { type: user, name: bob@example.com }
    at: "2024-01-16T10:00:00Z"
    note: "Fixed the configuration"
  muted: false               # Whether muted by a mute rule
  targetExists: true         # Whether target resource still exists

Insight States

StateMeaning
ActiveIssue detected and needs attention
AcknowledgedSomeone has seen it and is aware
SnoozedTemporarily suppressed until a specified time
ResolvedIssue has been addressed

Severity Levels

SeverityMeaningTypical Use
infoInformational, optimization opportunityUnderutilization, cost savings
warningShould address soonMisconfigurations, deprecations
criticalImmediate action requiredSecurity issues, failures

Insight Actions (Subresources)

Users interact with insights via subresources:

Acknowledge

bash
kubectl patch insight insight-abc123 --subresource=acknowledge \
  --type=merge -p '{"note": "I am looking into this"}'

Snooze

bash
kubectl patch insight insight-abc123 --subresource=snooze \
  --type=merge -p '{"duration": "4h"}'  # or {"until": "2024-01-16T10:00:00Z"}

Resolve

bash
kubectl patch insight insight-abc123 --subresource=resolve \
  --type=merge -p '{"note": "Fixed the configuration"}'

Assign

bash
kubectl patch insight insight-abc123 --subresource=assign \
  --type=merge -p '{"assignee": {"type": "user", "name": "bob@example.com"}}'

InsightMuteRule

Suppress insights that are known or expected.

Example: Mute All Info-Level Insights in Dev

yaml
apiVersion: insights.miloapis.com/v1alpha1
kind: InsightMuteRule
metadata:
  name: mute-dev-info
  namespace: development
spec:
  match:
    severity: info
  reason: "Development namespace - info-level insights expected"

Example: Mute Specific Policy Rule

yaml
apiVersion: insights.miloapis.com/v1alpha1
kind: InsightMuteRule
metadata:
  name: mute-known-issue
  namespace: my-project
spec:
  match:
    policyRef:
      name: myresource-config-issues
      namespace: myservice-system
      ruleName: config-conflict
  reason: "Known issue, fix scheduled for next sprint"
  expiresAt: "2024-02-01T00:00:00Z"  # Auto-expires

Match Criteria

FieldDescription
policyRefMute insights from specific policy/rule
categoryMute insights of a specific category
targetRefMute insights about a specific resource
severityMute insights at or below this severity
labelSelectorMute insights matching labels

Common Policy Patterns

Configuration Validation

yaml
rules:
  - name: invalid-replica-count
    condition: "object.spec.replicas < 1"
    severity: critical
    category: configuration
    message: "{{ object.kind }} {{ object.metadata.name }} has invalid replica count"
    description: "Replica count must be at least 1. Current value: {{ object.spec.replicas }}"

Status Condition Checks

yaml
rules:
  - name: not-ready
    condition: |
      object.status.conditions.exists(c,
        c.type == 'Ready' && c.status == 'False' &&
        timestamp(c.lastTransitionTime) < now() - duration('10m')
      )
    severity: warning
    category: health
    message: "{{ object.kind }} {{ object.metadata.name }} has been not ready for over 10 minutes"

Security Issues

yaml
rules:
  - name: privileged-container
    condition: "object.spec.template.spec.containers.exists(c, c.securityContext.privileged == true)"
    severity: critical
    category: security
    message: "{{ object.kind }} {{ object.metadata.name }} uses privileged containers"
    description: "Privileged containers are a security risk. Consider using specific capabilities instead."

Resource Optimization

yaml
rules:
  - name: no-resource-limits
    condition: |
      object.spec.template.spec.containers.exists(c,
        !has(c.resources.limits) || !has(c.resources.limits.memory)
      )
    severity: info
    category: optimization
    message: "{{ object.kind }} {{ object.metadata.name }} has containers without memory limits"

Service Integration Guide

1. Identify Detection Opportunities

Ask:

  • What misconfigurations are common?
  • What health issues can be detected early?
  • What security concerns should be flagged?
  • What optimization opportunities exist?

2. Create InsightPolicy Resources

Create policy files in config/insights/:

code
config/
└── insights/
    ├── kustomization.yaml
    ├── myresource-config.yaml      # Configuration issues
    ├── myresource-health.yaml      # Health checks
    └── myresource-security.yaml    # Security concerns

3. Define Categories

Use consistent categories across your service:

CategoryUse For
configurationInvalid settings, conflicts
securitySecurity misconfigurations
healthFailures, degraded state
performancePerformance concerns
optimizationCost savings, efficiency
compliancePolicy violations

4. Include in Kustomization

yaml
# config/base/kustomization.yaml
resources:
  - deployment.yaml
  - service.yaml

# Include insights policies
components:
  - ../insights

Querying Insights

List All Insights

bash
kubectl get insights -A
kubectl get insights -n my-project

Filter by Severity

bash
kubectl get insights -A --field-selector spec.severity=critical

Filter by State

bash
kubectl get insights -A --field-selector status.state=Active

Watch for New Insights

bash
kubectl get insights -A --watch

Related Files

  • implementation.md — Detailed policy creation guide
  • scripts/validate-insights.sh — Validation script
  • scripts/scaffold-insights.sh — Policy scaffolding script

Related Skills

  • capability-activity — Similar policy-driven model for activity timelines
  • k8s-apiserver-patterns — For implementing the resources that insights monitor