Capability-Oriented Security Reasoning Framework

Non-goal: This framework does not attempt to classify code as malicious or benign. It enumerates potential capability changes and contextual signals that may support or refute security hypotheses.

Goal: Provide a constrained vocabulary and reasoning structure for describing what becomes possible when code changes, enabling systematic capability expansion analysis.

Atomic unit: Version transition (diff), not standalone code. Capabilities are attributed to added/modified hunks.

Core Principle: Capability-First Reasoning

Traditional approach:

"Does this match a known attack pattern?" → Binary classification

This framework:

"What new affordances does this create?" → Capability description → Contextual reasoning

Capability Taxonomy

Use this vocabulary to describe what code can do, not what it "is."

Capabilities should be attributed to added/modified hunks where possible. Existing capabilities present in both versions are background context, not delta.

Network Capabilities

•network.http_client - Can initiate HTTP/HTTPS requests
•network.socket - Can create raw network sockets
•network.dns - Can perform DNS queries
•network.alternate_protocol - Can use FTP, SMTP, etc.

Environment Capabilities

•environment.read_single - Can read specific environment variable
•environment.read_wholesale - Can enumerate all environment variables
•environment.write - Can modify environment

Filesystem Capabilities

•filesystem.read_generic - Can read files
•filesystem.read_sensitive - Can access .ssh, .aws, .env, etc.
•filesystem.write - Can create/modify files
•filesystem.permission_change - Can chmod/chown files

Process Capabilities

•process.spawn - Can create child processes
•process.exec - Can execute system commands
•process.eval - Can dynamically execute code

Data Transformation Capabilities

•encoding.base64 - Can encode/decode base64
•encoding.hex - Can encode/decode hexadecimal
•encoding.compress - Can compress/decompress (gzip, zlib)
•crypto.encrypt - Can encrypt data
•crypto.decrypt - Can decrypt data

Conditional Execution Capabilities

•conditional.environment_gated - Execution depends on environment variables
•conditional.time_gated - Execution depends on date/time
•conditional.platform_gated - Execution depends on OS/platform
•conditional.input_gated - Execution depends on function arguments

Execution Phase Capabilities (CRITICAL for supply-chain)

•phase.install_time - Runs during package installation (npm lifecycle hooks, setup.py)
•phase.import_time - Runs when module is imported (module-level side effects)
•phase.build_time - Runs during build/compilation (build scripts, webpack)
•phase.runtime - Runs when explicitly invoked via API

Why phase matters: Install-time execution bypasses code review. Build-time divergence enables XZ-style attacks.

Counterfactual Reasoning Framework

For each code change, systematically enumerate:

1. Capability Delta

Before: List capabilities present in previous version After: List capabilities present in new version Added: Capabilities in After but not in Before (focus here) Removed: Capabilities in Before but not in After

Attribution: Link capabilities to specific hunks/lines where possible.

2. Affordance Questions

For each added capability, ask:

•Reach: What data can this capability access?
•Transform: How can that data be modified?
•Transmit: Where can that data be sent?
•Persist: Can effects outlive the process?
•Trigger: Under what conditions does this activate?
•Phase: When does this execute (install/import/build/runtime)?

3. Composition Analysis

For capability combinations, describe:

•Data flow: A → B → C (e.g., env_read → encode → network)
•Control flow: IF condition THEN capability (e.g., if env.CI then network.http)
•Timing: Sequential, parallel, or conditional chains
•Phase interaction: Does install-time code enable runtime behavior?

4. Intent Alignment Assessment

Compare observed capabilities with stated package purpose:

•Stated purpose: From package description, README, documentation
•Implied capabilities: What capabilities does purpose require?
•Observed capabilities: What capabilities exist in code?
•Alignment gap: Capabilities present but not implied by purpose

5. Uncertainty Qualification

Observation Confidence:

•HIGH: Capability is explicit (imports + callsite visible in code)
•MEDIUM: Capability inferred (wrapper function, indirect call, dynamic import)
•LOW: Capability speculative (requires runtime resolution, obfuscated)

Dynamic Resolution Flag:

•requires_dynamic_resolution: true - Cannot determine statically (eval, computed imports)
•requires_dynamic_resolution: false - Statically observable

Context Budget Policy

To prevent hidden overfitting and ensure reproducible evaluation:

Default context (always provide):

•Changed files only (diffs)
•Minimal package metadata (name, version, 1-sentence description)

Escalation context (optional, must log):

•Full file context (not just diffs)
•Complete README
•Dependency tree
•Maintainer history

Logging requirement: If escalating beyond default context, document what additional context was used and why.

This ensures methods sections can accurately describe information available to the model.

Available Tools

Note: Tools are executable scripts in the tools/ directory. Call them via bash when needed.

1. extract_capabilities (REQUIRED)

Extracts security-relevant capabilities from code with diff-aware attribution.

Purpose: Build factual inventory of what code can do

When to use: Always, as first step in analysis

Returns: List of capabilities with:

•capability - Taxonomy identifier
•phase - Execution phase (if detectable)
•evidence_span - {file, hunk_id, start_line, end_line}
•origin - "added" | "removed" | "preexisting"
•confidence_obs - "HIGH" | "MEDIUM" | "LOW"
•requires_dynamic_resolution - true | false
•context - Code snippet showing capability

Example:

python

extract_capabilities(
    old_code="...",
    new_code="import requests\nif os.environ.get('CI'): requests.get(...)",
    language="python"
)
# Returns: [
#   {
#     capability: "network.http_client",
#     phase: "import_time",
#     evidence_span: {file: "main.py", hunk: 1, start: 1, end: 1},
#     origin: "added",
#     confidence_obs: "HIGH",
#     requires_dynamic_resolution: false,
#     context: "import requests"
#   },
#   {
#     capability: "conditional.environment_gated",
#     phase: "runtime",
#     evidence_span: {file: "main.py", hunk: 2, start: 2, end: 2},
#     origin: "added",
#     confidence_obs: "HIGH",
#     requires_dynamic_resolution: false,
#     context: "if os.environ.get('CI')"
#   }
# ]

2. analyze_execution_paths (OPTIONAL - Confirmatory Only)

Surfaces potential execution paths through code.

Purpose: Understand how capabilities might compose

When to use: When you need to trace data/control flow

NOT for: Determining reachability or confirmed behavior

Returns:

•possible_paths - Sequences of capability nodes
•conditions - Normalized triggers
•note - Always includes "possible, not confirmed"
•Never returns "reachable: true" or definitive flow

Example:

python

analyze_execution_paths(
    code="...",
    language="javascript"
)
# Returns: {
#   possible_paths: ["env_read → encode → network", "env_read → filesystem"],
#   conditions: ["process.env.CI", "process.platform === 'linux'"],
#   note: "These are possible paths based on static analysis, not confirmed execution"
# }

3. search_capability_examples (OPTIONAL - Explanatory Only)

Finds historical examples where capability overlap exists.

Purpose: Provide context, not classification

When to use: To explain or provide evidence for hypothesis

NOT for: Pattern matching, similarity scoring, or labeling

Returns (sanitized schema):

•example_name - Identifier only
•capabilities_overlap - List of overlapping capabilities
•why_relevant - One sentence explanation
•caution - Always included disclaimer

NO similarity scores. NO "this matches X" language.

Example:

python

search_capability_examples(
    capabilities=["environment.read_wholesale", "network.http_client", "phase.install_time"]
)
# Returns: [
#   {
#     example_name: "ctx-2021",
#     capabilities_overlap: ["environment.read_wholesale", "network.http_client"],
#     why_relevant: "Historical example of wholesale env access + network transmission",
#     caution: "Overlap exists for context. Does not indicate malicious intent."
#   }
# ]

Capability Risk Composition Matrix

This describes potential security implications of capability combinations, not verdicts.

Capabilities	Potential Implication	Why Notable
environment.read_wholesale + network.http_client	Data exfiltration channel	All env vars accessible + transmission capability
process.exec + network.http_client	Remote command execution channel	External input could control commands
filesystem.read_sensitive + encoding.base64 + network.http_client	Credential theft channel	Sensitive data + obfuscation + transmission
conditional.environment_gated + network.http_client	Selective activation	Behavior varies by environment (CI vs local)
phase.install_time + network.http_client	Pre-review execution	Runs before code review, in high-privilege context
phase.build_time + filesystem.write	Build-time injection	Can modify artifacts not in source control
encoding.base64 + process.eval	Obfuscated code execution	Hidden logic execution

Note: These describe possibilities, not probabilities or intentions.

Historical Capability Pattern Examples

These are post-hoc explanations, not detection rules.

Example: event-stream (2018)

Capabilities observed:

•environment.read_single (npm_package_description)
•conditional.environment_gated
•crypto.decrypt
•phase.runtime

Use of this example: Illustrates that environment-gated execution can enable targeted attacks. Does NOT mean all env-gated code is malicious.

Example: ua-parser-js (2021)

Capabilities observed:

•conditional.platform_gated (process.platform)
•process.spawn
•phase.install_time

Use of this example: Shows install-time + platform-gating pattern. Does NOT mean install hooks indicate compromise.

Example: ctx/phpass (2021)

Capabilities observed:

•environment.read_wholesale (os.environ)
•encoding.base64
•network.http_client
•phase.install_time (setup.py)

Use of this example: Demonstrates wholesale env + encoding + network pattern. Does NOT make this combination automatically suspicious.

Example: XZ Utils (CVE-2024-3094, 2024)

Capabilities observed:

•phase.build_time (injection in release tarball, not git)
•conditional.environment_gated (SSH + systemd context)
•filesystem.write (binary blobs)
•Long-term social engineering

Use of this example: Illustrates build-time vs source-time capability divergence. Does NOT mean all build scripts are suspect.

False Positive Awareness

Benign code often has security-relevant capabilities:

Telemetry/Analytics

Capabilities: network.http_client + conditional.environment_gated Benign when: Documented, opt-out available, analytics domain matches package Check: Is DISABLE_ANALYTICS respected? Is domain in README?

Update Checks

Capabilities: network.http_client Benign when: Checking version only, not sending user data Check: Is request to package registry? Is response only version info?

License Validation

Capabilities: network.http_client + environment.read_single Benign when: Commercial package, license endpoint documented Check: Is package commercial? Is validation endpoint disclosed?

Handling Obfuscated Code

Malicious code is often heavily obfuscated to evade analysis. This framework includes strategies for analyzing obfuscated code.

Obfuscation Indicators

•Hex-encoded function names (_0x4e9bf4, _0x112fa8)
•Large arrays of encoded strings
•Self-modifying code patterns
•Computed property access (window[_0x4e9bf4(0x174)])
•Nested function calls with numeric offsets
•Unusual arithmetic expressions as array indices

De-Obfuscation Strategy

When encountering obfuscated code:

•
Identify String Arrays: Look for large arrays containing encoded strings
- •Often named _0xNNNN or similar patterns
- •Usually defined at module/function scope
•
Find Decoder Functions: Locate functions that map indices to strings
- •Pattern: function _0xNNNN(index) { return array[index - offset]; }
- •May include string transformations (base64, rot13, etc.)
•
Trace High-Value API Calls: Focus on capability-relevant APIs even if obfuscated
- •Look for patterns like window[...] (DOM access)
- •Network APIs: fetch, XMLHttpRequest, .get, .post, .send
- •Crypto APIs: wallet-related strings in arrays
- •Environment: process, env, global object access
•
Extract String Literals: Analyze string array contents
- •Cryptocurrency addresses (bc1, 0x, etc.)
- •Domain names and URLs
- •API endpoint patterns
- •Wallet-related terms (ethereum, solana, bitcoin)
•
Infer Capabilities from Context: Even without full de-obfuscation
- •window[encoded](encoded_method) → likely DOM/browser API
- •Conditional checks + network → environment-gated behavior
- •Large encoded arrays + network → likely data exfiltration

Obfuscated Code Analysis Workflow

code

1. Identify obfuscation pattern (array + decoder function)
   ↓
2. Extract string array contents (literal strings)
   ↓
3. Search for security-relevant keywords:
   - wallet, ethereum, solana, bitcoin, crypto
   - fetch, XMLHttpRequest, request, http
   - window, document, navigator
   - process.env, os.environ
   ↓
4. Map API patterns to capabilities:
   - window.ethereum → credential_access (wallet interaction)
   - fetch/XHR → network.http_client
   - Conditionals → conditional.environment_gated
   ↓
5. Describe capabilities with:
   - confidence: LOW/MEDIUM (due to obfuscation)
   - requires_dynamic_resolution: true
   - evidence: String literals found in array

Example: Obfuscated Wallet Stealer

javascript

const _0x112fa8=_0x180f;
function _0x180f(_0x240418,_0xdfe6b8){
    const _0x3b4f1d=_0x550a();
    return _0x3b4f1d[_0x240418-0x100];
}
function _0x550a(){
    return ['ethereum','solana','bitcoin','fetch','send'];
}
typeof window[_0x112fa8(0x100)]!='undefined'?checkWallet():skip();

Capabilities identified (even without full de-obfuscation):

•network.http_client (confidence: MEDIUM) - 'fetch', 'send' in string array
•credential_access (confidence: MEDIUM) - 'ethereum', 'solana', 'bitcoin' + window access
•conditional.environment_gated (confidence: HIGH) - typeof check for window
•requires_dynamic_resolution: true - Obfuscated control flow

Evidence: Lines where string array contains wallet-related terms, lines where window[encoded] pattern appears

Confidence Levels for Obfuscated Code

•HIGH confidence: When string literals directly indicate capabilities (e.g., "https://evil.com" in array)
•MEDIUM confidence: When API patterns are recognizable despite obfuscation
•LOW confidence: When only structural patterns suggest capabilities

Always mark: requires_dynamic_resolution: true for heavily obfuscated code

Analysis Workflow

•
Extract capabilities (use extract_capabilities tool)
- •Get diff-attributed inventory
- •Note phase, origin, confidence for each
•
Compute capability delta
- •Focus on origin: "added"
- •Background context: origin: "preexisting"
•
Describe affordances (use counterfactual framework)
- •What becomes possible that wasn't before?
- •How do capabilities compose?
- •What phase do they execute in?
•
Assess intent alignment (compare to package purpose)
- •Do capabilities match stated purpose?
- •Is there an alignment gap?
•
(Optional) Check execution paths (use analyze_execution_paths)
- •How might capabilities connect?
- •What data flows are possible?
•
(Optional) Find examples (use search_capability_examples)
- •Has overlap occurred before?
- •What context do historical cases provide?
•
Render analysis (describe, don't classify)
- •Enumerate capabilities with evidence
- •Describe potential implications
- •State confidence and uncertainty
- •Provide context

Output Format

Your analysis should describe what is, not what it means:

✅ Good Output Format

code

Capability Delta:
- Added: network.http_client (line 15, hunk 2, confidence: HIGH, phase: import_time)
- Added: environment.read_wholesale (line 12, hunk 2, confidence: HIGH, phase: runtime)
- Added: conditional.environment_gated (line 11, hunk 2, confidence: HIGH, condition: process.env.CI)

Evidence Spans:
- File: main.py, hunk 2, lines 11-15 (new code added in this version)

Composition:
- Observed path: env_read → network (lines 12-15)
- Conditional: Only when process.env.CI is truthy
- Phase: import_time network setup, runtime execution

Affordance Description:
- This combination creates a channel for environment variable transmission
- Activation is selective (CI environments only)
- All environment variables are accessible (wholesale access)
- Executes when package is imported (phase.import_time for network import)

Intent Alignment:
- Package purpose: "Simple date formatting utility"
- Implied capabilities: String manipulation, date parsing
- Observed capabilities: Network transmission, environment access
- Gap: Network and environment capabilities not implied by "date formatting"

Uncertainty:
- Confidence (observation): HIGH - Direct evidence in added hunks
- Requires dynamic resolution: false - All capabilities statically observable
- Alternative interpretation: None identified

Historical Context (from search_capability_examples):
- Overlap exists with ctx-2021: env.read_wholesale + network.http_client
- Caution: Overlap provides context only, not classification

❌ Bad Output Format

code

VERDICT: CRITICAL RISK - Malicious credential theft detected
CONFIDENCE: 95%
Pattern match: ctx attack (similarity: 0.95)
BLOCK DEPLOYMENT

Key Constraints

•No autonomous conclusions: Tools surface data, YOU reason
•No risk scoring: Describe implications, don't score them
•No classification: Enumerate capabilities, don't label "malicious/benign"
•Pattern examples are explanatory: Historical overlap provides context, not verdicts
•Confidence is about observation: How certain are you about what code does, not what it "is"
•Diff-scoped attribution: Link capabilities to specific hunks where possible
•Phase-aware analysis: Always note when code executes (install/import/build/runtime)
•Context budget: Log any escalation beyond default context

This Framework Defines Your Dataset Labels

Direct mapping to annotation schema:

•capability_delta[] - List of added/removed capabilities
•trigger_surface[] - Conditional execution patterns
•phase_delta[] - Changes in execution phase
•alignment_gap - Qualitative intent mismatch description
•confidence_obs - HIGH/MEDIUM/LOW per capability
•evidence_span - Localization for each capability
•requires_dynamic_resolution - Static/dynamic analysis boundary

This Framework Is

✅ A capability vocabulary ✅ A reasoning scaffold
✅ An annotation ontology ✅ A dataset labeling schema ✅ A reviewer-legible explanation layer

This Framework Is NOT

❌ A malware detector ❌ A rules engine ❌ A source of truth ❌ A substitute for reasoning ❌ A pattern matching system