Binary Triage (Phase 1)
Purpose
Quick fingerprinting to establish baseline facts before deeper analysis. Runs in seconds, not minutes.
When to Use
- •First contact with an unknown binary
- •Need architecture/ABI info for tool selection
- •Quick capability assessment
- •Before committing to expensive analysis
Key Principle
Gather facts fast, defer analysis.
This phase identifies WHAT the binary is, not HOW it works.
Triage Sequence
Step 1: File Identification
bash
# Basic identification file binary # Expected output patterns: # ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3 # ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1
Extract:
- •Architecture (ARM, ARM64, x86_64, MIPS)
- •Bit width (32/64)
- •Endianness (LSB/MSB)
- •Link type (static/dynamic)
- •Interpreter path (libc indicator)
Step 2: Structured Metadata (rabin2)
bash
# All metadata as JSON rabin2 -q -j -I binary | jq . # Key fields: # .arch - "arm", "x86", "mips" # .bits - 32 or 64 # .endian - "little" or "big" # .os - "linux", "none" # .machine - "ARM", "AARCH64" # .stripped - true/false # .static - true/false
Step 3: ABI Detection
bash
# Interpreter detection readelf -p .interp binary 2>/dev/null # Or via rabin2 rabin2 -I binary | grep interp # ARM-specific: float ABI readelf -A binary | grep "Tag_ABI_VFP_args" # hard-float: "VFP registers" # soft-float: missing or "compatible"
Interpreter → Libc mapping:
| Interpreter | Libc | Notes |
|---|---|---|
/lib/ld-linux-armhf.so.3 | glibc | ARM hard-float |
/lib/ld-linux.so.3 | glibc | ARM soft-float |
/lib/ld-musl-arm.so.1 | musl | ARM 32-bit |
/lib/ld-musl-aarch64.so.1 | musl | ARM 64-bit |
/lib/ld-uClibc.so.0 | uClibc | Embedded |
/lib64/ld-linux-x86-64.so.2 | glibc | x86_64 |
Step 4: Dependencies
bash
# Library dependencies rabin2 -q -j -l binary | jq '.libs[]' # Common patterns: # libcurl.so.* → HTTP client # libssl.so.* → TLS/crypto # libpthread.so.* → Threading # libz.so.* → Compression # libsqlite3.so.* → Local database
Step 5: Entry Points & Exports
bash
# Entry points
rabin2 -q -j -e binary | jq .
# Exports (for shared libraries)
rabin2 -q -j -E binary | jq '.exports[] | {name, vaddr}'
Step 6: Quick String Scan
bash
# All strings with metadata
rabin2 -q -j -zz binary | jq '.strings | length' # Count first
# Filter interesting strings (URLs, paths, errors)
rabin2 -q -j -zz binary | jq '
.strings[] |
select(.length > 8) |
select(.string | test("http|ftp|/etc|/var|error|fail|pass|key|token"; "i"))
'
Step 7: Import Analysis
bash
# All imports
rabin2 -q -j -i binary | jq '.imports[] | {name, lib}'
# Group by capability
rabin2 -q -j -i binary | jq '
.imports | group_by(.lib) |
map({lib: .[0].lib, functions: [.[].name]})
'
Capability Mapping
| Import Pattern | Capability |
|---|---|
socket, connect, send | Network client |
bind, listen, accept | Network server |
open, read, write | File I/O |
fork, exec*, system | Process spawning |
pthread_* | Multi-threading |
SSL_*, EVP_* | Cryptography |
dlopen, dlsym | Dynamic loading |
mmap, mprotect | Memory manipulation |
Output Format
After triage, record structured facts:
json
{
"artifact": {
"path": "/path/to/binary",
"sha256": "abc123...",
"size_bytes": 245760
},
"identification": {
"arch": "arm",
"bits": 32,
"endian": "little",
"os": "linux",
"stripped": true,
"static": false
},
"abi": {
"interpreter": "/lib/ld-musl-arm.so.1",
"libc": "musl",
"float_abi": "hard"
},
"dependencies": [
"libcurl.so.4",
"libssl.so.1.1",
"libz.so.1"
],
"capabilities_inferred": [
"network_client",
"tls_encryption",
"compression"
],
"strings_of_interest": [
{"value": "https://api.vendor.com/telemetry", "type": "url"},
{"value": "/etc/config.json", "type": "path"}
],
"complexity_estimate": {
"functions": "unknown (stripped)",
"strings": 847,
"imports": 156
}
}
Knowledge Journaling
After triage completes, record findings for episodic memory:
code
[BINARY-RE:triage] {filename} (sha256: {hash})
Identification:
Architecture: {arch} {bits}-bit {endian}
Libc: {glibc|musl|uclibc} ({interpreter_path})
Stripped: {yes|no}
Size: {bytes}
FACT: Links against {library} (source: rabin2 -l)
FACT: Contains {N} strings of interest (source: rabin2 -zz)
FACT: Imports {function} from {library} (source: rabin2 -i)
Capabilities inferred:
- {capability_1} (evidence: {import/string})
- {capability_2} (evidence: {import/string})
HYPOTHESIS: {what binary likely does} (confidence: {0.0-1.0})
QUESTION: {open unknown that needs investigation}
Next phase: {static-analysis|dynamic-analysis}
Sysroot needed: {path or "extract from device"}
Example Journal Entry
code
[BINARY-RE:triage] thermostat_daemon (sha256: a1b2c3d4...) Identification: Architecture: ARM 32-bit LE Libc: musl (/lib/ld-musl-arm.so.1) Stripped: yes Size: 153,600 bytes FACT: Links against libcurl.so.4 (source: rabin2 -l) FACT: Links against libssl.so.1.1 (source: rabin2 -l) FACT: Contains string "api.thermco.com" (source: rabin2 -zz) FACT: Imports curl_easy_perform (source: rabin2 -i) Capabilities inferred: - HTTP client (evidence: libcurl import) - TLS encryption (evidence: libssl import) - Network communication (evidence: URL string) HYPOTHESIS: Telemetry client that reports to api.thermco.com (confidence: 0.6) QUESTION: What data does it collect and transmit? Next phase: static-analysis Sysroot needed: musl ARM (extract from device or Alpine)
Decision Points
After triage, determine:
- •Sysroot selection - Based on arch + libc
- •Analysis tool chain - r2 vs Ghidra vs both
- •Dynamic analysis feasibility - QEMU viability based on arch
- •Initial hypotheses - What does this binary likely do?
Next Steps
→ Proceed to binary-re-static-analysis for function enumeration
→ Or binary-re-dynamic-analysis if behavior observation is priority