Embedding Hunt

Use embedding similarity to hunt for related behaviors across the environment.

When to Use

Use this skill when:

•You have a confirmed malicious finding and want to find similar activity
•Investigating whether a behavior is isolated or widespread
•Looking for variations of a known attack pattern
•Building a comprehensive view of an incident

Prerequisites

•Access to the DeepTempo Findings Server MCP
•A seed finding ID or embedding vector
•Understanding of behavioral similarity concepts

Instructions

Step 1: Establish the Seed

Start with a known finding that represents the behavior you want to hunt:

code

get_finding(finding_id="<seed_finding_id>")

Document:

•The behavioral pattern this finding represents
•Key characteristics (entities, techniques, timing)
•Why this is the hunting seed

Step 2: Expand the Search

Use nearest neighbors with increasing k values:

code

# Start narrow
nearest_neighbors(query="<seed_id>", k=10)

# Expand if pattern holds
nearest_neighbors(query="<seed_id>", k=50)

# Wide search for scope assessment
nearest_neighbors(query="<seed_id>", k=100)

Step 3: Analyze the Cluster

For each expansion level, analyze:

•Similarity Distribution: How quickly does similarity drop off?
•Entity Distribution: Same entity or multiple entities?
•Temporal Distribution: Clustered in time or spread out?
•Technique Consistency: Do neighbors share MITRE predictions?

Step 4: Apply Filters

Refine the hunt with filters:

code

# Filter by data source
nearest_neighbors(query="<seed_id>", k=50, filters={"data_source": "flow"})

# Filter by time range
nearest_neighbors(query="<seed_id>", k=50, filters={
    "time_range": {"start": "2024-01-15T00:00:00Z", "end": "2024-01-15T23:59:59Z"}
})

# Filter by minimum anomaly score
nearest_neighbors(query="<seed_id>", k=50, filters={"min_anomaly_score": 0.7})

Step 5: Identify Sub-Clusters

Look for natural groupings within results:

•Group by source IP
•Group by destination
•Group by time window
•Group by technique

Step 6: Generate Hunt Report

Document findings following the output format.

Output Format

markdown

# Embedding Hunt Report

**Seed Finding**: [Finding ID]
**Hunt Timestamp**: [Current Time]
**Status**: Requires Human Review

## Seed Behavior Summary

[Describe the behavior pattern being hunted]

### Seed Characteristics
| Attribute | Value |
|-----------|-------|
| Data Source | [source] |
| Primary Technique | [technique] |
| Anomaly Score | [score] |
| Key Entity | [entity] |

## Hunt Results

### Scope Summary

| Metric | Value |
|--------|-------|
| Total Similar Findings | [count] |
| Unique Source IPs | [count] |
| Unique Destinations | [count] |
| Unique Hostnames | [count] |
| Time Span | [duration] |

### Similarity Distribution

| Similarity Range | Count | Interpretation |
|------------------|-------|----------------|
| 0.95 - 1.00 | [n] | Near-identical behavior |
| 0.90 - 0.95 | [n] | Very similar |
| 0.80 - 0.90 | [n] | Related pattern |
| 0.70 - 0.80 | [n] | Loosely related |

### Entity Analysis

#### Affected Entities
| Entity | Finding Count | First Seen | Last Seen |
|--------|---------------|------------|-----------|
| [entity] | [count] | [time] | [time] |

#### Entity Relationships
[Describe connections between entities]

### Temporal Analysis

[Describe timing patterns:
- When did activity start?
- Is it ongoing?
- Are there bursts or steady activity?]

### Technique Distribution

| Technique | Findings | Avg Confidence |
|-----------|----------|----------------|
| [T####] | [count] | [avg] |

## Identified Clusters

### Cluster 1: [Label]
- **Findings**: [count]
- **Common Characteristic**: [description]
- **Entities**: [list]
- **Assessment**: [interpretation]

### Cluster 2: [Label]
[Repeat structure]

## Hunt Conclusions

### Pattern Assessment
[Is this isolated or widespread? Coordinated or independent?]

### Threat Assessment
[What does the scope tell us about the threat?]

### Confidence Level
[High/Medium/Low] - [Reasoning]

## Recommended Actions

### Immediate
1. [Action]

### Investigation
1. [Action]

### Monitoring
1. [Action]

---
*This report was generated by Claude using the Embedding Hunt skill.*
*All findings require human validation.*

Examples

Example 1: Hunting from Confirmed C2

Seed: Confirmed C2 beacon from compromised host Hunt Goal: Find other compromised hosts

Approach:

•Use seed embedding to find similar beaconing patterns
•Filter to exclude the seed host
•Group results by source IP
•Each unique source IP is a potential compromise

Example 2: Hunting Lateral Movement

Seed: Detected lateral movement attempt Hunt Goal: Map the full movement path

Approach:

•Find similar authentication/movement patterns
•Build timeline of activity
•Identify source and destination hosts
•Reconstruct the movement chain

Guidelines

•Start narrow, expand gradually - Don't overwhelm with too many results initially
•Document the seed clearly - Others need to understand what you're hunting
•Look for natural breakpoints - Similarity drop-offs indicate cluster boundaries
•Consider false positives - High similarity doesn't guarantee malicious
•Time-bound your hunt - Set reasonable time windows
•Validate findings - Spot-check results for relevance

Constraints

•Do not assume all similar findings are malicious
•Validate clusters before drawing conclusions
•Note limitations of embedding similarity
•Require human review for any response actions
•Document methodology so hunts are reproducible