AgentSkillsCN

embedding-hunt

通过嵌入向量的转换,挖掘行为聚类,发现各实体间的关联活动,并识别可能指向协同攻击或大规模威胁的潜在模式。

SKILL.md
--- frontmatter
name: embedding-hunt
description: Pivot from one embedding to discover behavior clusters, find related activity across entities, and identify patterns that may indicate coordinated or widespread threats
version: 1.0.0
author: DeepTempo
tags:
  - soc
  - hunting
  - clustering
  - investigation
requires:
  - mcp/deeptempo-findings-server

Embedding Hunt

Use embedding similarity to hunt for related behaviors across the environment.

When to Use

Use this skill when:

  • You have a confirmed malicious finding and want to find similar activity
  • Investigating whether a behavior is isolated or widespread
  • Looking for variations of a known attack pattern
  • Building a comprehensive view of an incident

Prerequisites

  • Access to the DeepTempo Findings Server MCP
  • A seed finding ID or embedding vector
  • Understanding of behavioral similarity concepts

Instructions

Step 1: Establish the Seed

Start with a known finding that represents the behavior you want to hunt:

code
get_finding(finding_id="<seed_finding_id>")

Document:

  • The behavioral pattern this finding represents
  • Key characteristics (entities, techniques, timing)
  • Why this is the hunting seed

Step 2: Expand the Search

Use nearest neighbors with increasing k values:

code
# Start narrow
nearest_neighbors(query="<seed_id>", k=10)

# Expand if pattern holds
nearest_neighbors(query="<seed_id>", k=50)

# Wide search for scope assessment
nearest_neighbors(query="<seed_id>", k=100)

Step 3: Analyze the Cluster

For each expansion level, analyze:

  1. Similarity Distribution: How quickly does similarity drop off?
  2. Entity Distribution: Same entity or multiple entities?
  3. Temporal Distribution: Clustered in time or spread out?
  4. Technique Consistency: Do neighbors share MITRE predictions?

Step 4: Apply Filters

Refine the hunt with filters:

code
# Filter by data source
nearest_neighbors(query="<seed_id>", k=50, filters={"data_source": "flow"})

# Filter by time range
nearest_neighbors(query="<seed_id>", k=50, filters={
    "time_range": {"start": "2024-01-15T00:00:00Z", "end": "2024-01-15T23:59:59Z"}
})

# Filter by minimum anomaly score
nearest_neighbors(query="<seed_id>", k=50, filters={"min_anomaly_score": 0.7})

Step 5: Identify Sub-Clusters

Look for natural groupings within results:

  • Group by source IP
  • Group by destination
  • Group by time window
  • Group by technique

Step 6: Generate Hunt Report

Document findings following the output format.

Output Format

markdown
# Embedding Hunt Report

**Seed Finding**: [Finding ID]
**Hunt Timestamp**: [Current Time]
**Status**: Requires Human Review

## Seed Behavior Summary

[Describe the behavior pattern being hunted]

### Seed Characteristics
| Attribute | Value |
|-----------|-------|
| Data Source | [source] |
| Primary Technique | [technique] |
| Anomaly Score | [score] |
| Key Entity | [entity] |

## Hunt Results

### Scope Summary

| Metric | Value |
|--------|-------|
| Total Similar Findings | [count] |
| Unique Source IPs | [count] |
| Unique Destinations | [count] |
| Unique Hostnames | [count] |
| Time Span | [duration] |

### Similarity Distribution

| Similarity Range | Count | Interpretation |
|------------------|-------|----------------|
| 0.95 - 1.00 | [n] | Near-identical behavior |
| 0.90 - 0.95 | [n] | Very similar |
| 0.80 - 0.90 | [n] | Related pattern |
| 0.70 - 0.80 | [n] | Loosely related |

### Entity Analysis

#### Affected Entities
| Entity | Finding Count | First Seen | Last Seen |
|--------|---------------|------------|-----------|
| [entity] | [count] | [time] | [time] |

#### Entity Relationships
[Describe connections between entities]

### Temporal Analysis

[Describe timing patterns:
- When did activity start?
- Is it ongoing?
- Are there bursts or steady activity?]

### Technique Distribution

| Technique | Findings | Avg Confidence |
|-----------|----------|----------------|
| [T####] | [count] | [avg] |

## Identified Clusters

### Cluster 1: [Label]
- **Findings**: [count]
- **Common Characteristic**: [description]
- **Entities**: [list]
- **Assessment**: [interpretation]

### Cluster 2: [Label]
[Repeat structure]

## Hunt Conclusions

### Pattern Assessment
[Is this isolated or widespread? Coordinated or independent?]

### Threat Assessment
[What does the scope tell us about the threat?]

### Confidence Level
[High/Medium/Low] - [Reasoning]

## Recommended Actions

### Immediate
1. [Action]

### Investigation
1. [Action]

### Monitoring
1. [Action]

---
*This report was generated by Claude using the Embedding Hunt skill.*
*All findings require human validation.*

Examples

Example 1: Hunting from Confirmed C2

Seed: Confirmed C2 beacon from compromised host Hunt Goal: Find other compromised hosts

Approach:

  1. Use seed embedding to find similar beaconing patterns
  2. Filter to exclude the seed host
  3. Group results by source IP
  4. Each unique source IP is a potential compromise

Example 2: Hunting Lateral Movement

Seed: Detected lateral movement attempt Hunt Goal: Map the full movement path

Approach:

  1. Find similar authentication/movement patterns
  2. Build timeline of activity
  3. Identify source and destination hosts
  4. Reconstruct the movement chain

Guidelines

  1. Start narrow, expand gradually - Don't overwhelm with too many results initially
  2. Document the seed clearly - Others need to understand what you're hunting
  3. Look for natural breakpoints - Similarity drop-offs indicate cluster boundaries
  4. Consider false positives - High similarity doesn't guarantee malicious
  5. Time-bound your hunt - Set reasonable time windows
  6. Validate findings - Spot-check results for relevance

Constraints

  • Do not assume all similar findings are malicious
  • Validate clusters before drawing conclusions
  • Note limitations of embedding similarity
  • Require human review for any response actions
  • Document methodology so hunts are reproducible