Benchmark Resource Usage

Measure CPU, memory, and disk I/O costs of scanning operations to evaluate polling strategies and optimization approaches.

Quick Start

bash

# Benchmark a function at 1Hz for 30 seconds
python scripts/benchmark_template.py 30 1.0

# Run with detailed resource metrics
/usr/bin/time -l python scripts/benchmark_template.py 30 1.0

Workflow

1. Extract Function to Benchmark

Create a standalone script with your scanning logic:

python

def discover_targets():
    # Your actual scanning logic here
    # E.g., ps aux, grep through logs, list files
    return results

2. Add Benchmark Loop

Copy scripts/benchmark_template.py and replace the discovery function. The template provides:

•Timed iterations with interval control
•Progress reporting every 10 iterations
•Summary statistics (avg time, theoretical max rate)
•CPU percentage estimates

3. Run with Resource Tracking

bash

/usr/bin/time -l python your_benchmark.py <duration_s> <interval_s>

Example:

bash

# Run for 30 seconds at 1Hz
/usr/bin/time -l python benchmark_discovery.py 30 1.0

4. Interpret Results

See references/interpreting_time_output.md for detailed guidance.

Key formulas:

python

CPU_percent_at_1Hz = (avg_time_ms / 1000) × 100
User_CPU_percent = (user_time / wall_time) × 100
System_CPU_percent = (sys_time / wall_time) × 100

Example output:

code

Avg time/iteration: 362.8ms
       30.12 real        12.59 user        11.73 sys

Interpretation:

•CPU at 1Hz: 36.3% (362.8ms / 1000ms)
•User CPU: 41.8% (12.59s / 30.12s)
•System CPU: 38.9% (11.73s / 30.12s)

5. Make Recommendations

CPU usage guidelines (at 1Hz):

Range	Verdict	Action
<5%	Negligible	Can poll aggressively
5-15%	Moderate	Acceptable for monitoring
15-30%	High	Consider slower poll or optimization
>30%	Very high	Requires optimization or event-driven

Common Optimizations

Targeted Scanning

Instead of scanning all items, target known IDs:

bash

ps aux              # Scan 500 processes → ~360ms
ps -p PID1,PID2,... # Check 10 PIDs → ~300ms (17% faster)

Use when: You have a source of known IDs (logs, config, state file)

Caching

Cache expensive syscall results:

python

cache = {}
new_items = current - cached
for item in new_items:
    cache[item] = expensive_call(item)

Expected: First call slow, subsequent <10ms

Slower Polling

Reduce frequency for non-critical data:

code

1Hz → 0.1Hz: 10× less CPU
Critical at 1Hz, non-critical at 0.1Hz

Event-Driven

Poll only on user action (manual refresh) instead of continuous:

Expected: 0% CPU when idle

Comparing Approaches

When comparing optimizations, create variants of the function:

python

def approach_a():  # Full scan
    return scan_all()

def approach_b():  # Targeted
    return scan_pids(known_pids)

# Benchmark both
benchmark(approach_a, duration, interval)
benchmark(approach_b, duration, interval)

Present results in a comparison table (see references for template).

Scripts

•scripts/benchmark_template.py - Template for benchmarking any function at configurable frequency
•scripts/compare_approaches.py - Run two approaches side-by-side and generate comparison report

References

•references/interpreting_time_output.md - Complete guide to /usr/bin/time -l metrics
•references/comparison_report_template.md - Template for presenting optimization comparisons