Embedding Hunt
Use embedding similarity to hunt for related behaviors across the environment.
When to Use
Use this skill when:
- •You have a confirmed malicious finding and want to find similar activity
- •Investigating whether a behavior is isolated or widespread
- •Looking for variations of a known attack pattern
- •Building a comprehensive view of an incident
Prerequisites
- •Access to the DeepTempo Findings Server MCP
- •A seed finding ID or embedding vector
- •Understanding of behavioral similarity concepts
Instructions
Step 1: Establish the Seed
Start with a known finding that represents the behavior you want to hunt:
code
get_finding(finding_id="<seed_finding_id>")
Document:
- •The behavioral pattern this finding represents
- •Key characteristics (entities, techniques, timing)
- •Why this is the hunting seed
Step 2: Expand the Search
Use nearest neighbors with increasing k values:
code
# Start narrow nearest_neighbors(query="<seed_id>", k=10) # Expand if pattern holds nearest_neighbors(query="<seed_id>", k=50) # Wide search for scope assessment nearest_neighbors(query="<seed_id>", k=100)
Step 3: Analyze the Cluster
For each expansion level, analyze:
- •Similarity Distribution: How quickly does similarity drop off?
- •Entity Distribution: Same entity or multiple entities?
- •Temporal Distribution: Clustered in time or spread out?
- •Technique Consistency: Do neighbors share MITRE predictions?
Step 4: Apply Filters
Refine the hunt with filters:
code
# Filter by data source
nearest_neighbors(query="<seed_id>", k=50, filters={"data_source": "flow"})
# Filter by time range
nearest_neighbors(query="<seed_id>", k=50, filters={
"time_range": {"start": "2024-01-15T00:00:00Z", "end": "2024-01-15T23:59:59Z"}
})
# Filter by minimum anomaly score
nearest_neighbors(query="<seed_id>", k=50, filters={"min_anomaly_score": 0.7})
Step 5: Identify Sub-Clusters
Look for natural groupings within results:
- •Group by source IP
- •Group by destination
- •Group by time window
- •Group by technique
Step 6: Generate Hunt Report
Document findings following the output format.
Output Format
markdown
# Embedding Hunt Report **Seed Finding**: [Finding ID] **Hunt Timestamp**: [Current Time] **Status**: Requires Human Review ## Seed Behavior Summary [Describe the behavior pattern being hunted] ### Seed Characteristics | Attribute | Value | |-----------|-------| | Data Source | [source] | | Primary Technique | [technique] | | Anomaly Score | [score] | | Key Entity | [entity] | ## Hunt Results ### Scope Summary | Metric | Value | |--------|-------| | Total Similar Findings | [count] | | Unique Source IPs | [count] | | Unique Destinations | [count] | | Unique Hostnames | [count] | | Time Span | [duration] | ### Similarity Distribution | Similarity Range | Count | Interpretation | |------------------|-------|----------------| | 0.95 - 1.00 | [n] | Near-identical behavior | | 0.90 - 0.95 | [n] | Very similar | | 0.80 - 0.90 | [n] | Related pattern | | 0.70 - 0.80 | [n] | Loosely related | ### Entity Analysis #### Affected Entities | Entity | Finding Count | First Seen | Last Seen | |--------|---------------|------------|-----------| | [entity] | [count] | [time] | [time] | #### Entity Relationships [Describe connections between entities] ### Temporal Analysis [Describe timing patterns: - When did activity start? - Is it ongoing? - Are there bursts or steady activity?] ### Technique Distribution | Technique | Findings | Avg Confidence | |-----------|----------|----------------| | [T####] | [count] | [avg] | ## Identified Clusters ### Cluster 1: [Label] - **Findings**: [count] - **Common Characteristic**: [description] - **Entities**: [list] - **Assessment**: [interpretation] ### Cluster 2: [Label] [Repeat structure] ## Hunt Conclusions ### Pattern Assessment [Is this isolated or widespread? Coordinated or independent?] ### Threat Assessment [What does the scope tell us about the threat?] ### Confidence Level [High/Medium/Low] - [Reasoning] ## Recommended Actions ### Immediate 1. [Action] ### Investigation 1. [Action] ### Monitoring 1. [Action] --- *This report was generated by Claude using the Embedding Hunt skill.* *All findings require human validation.*
Examples
Example 1: Hunting from Confirmed C2
Seed: Confirmed C2 beacon from compromised host Hunt Goal: Find other compromised hosts
Approach:
- •Use seed embedding to find similar beaconing patterns
- •Filter to exclude the seed host
- •Group results by source IP
- •Each unique source IP is a potential compromise
Example 2: Hunting Lateral Movement
Seed: Detected lateral movement attempt Hunt Goal: Map the full movement path
Approach:
- •Find similar authentication/movement patterns
- •Build timeline of activity
- •Identify source and destination hosts
- •Reconstruct the movement chain
Guidelines
- •Start narrow, expand gradually - Don't overwhelm with too many results initially
- •Document the seed clearly - Others need to understand what you're hunting
- •Look for natural breakpoints - Similarity drop-offs indicate cluster boundaries
- •Consider false positives - High similarity doesn't guarantee malicious
- •Time-bound your hunt - Set reasonable time windows
- •Validate findings - Spot-check results for relevance
Constraints
- •Do not assume all similar findings are malicious
- •Validate clusters before drawing conclusions
- •Note limitations of embedding similarity
- •Require human review for any response actions
- •Document methodology so hunts are reproducible