Optimizing R

This skill covers profiling, benchmarking, parallelization, and performance best practices for R.

Core Principle

Profile before optimizing - Use profvis and bench to identify real bottlenecks. Write readable code first, optimize only when necessary.

Profiling Tools Decision Matrix

Tool	Use When	Don't Use When	What It Shows
`profvis`	Complex code, unknown bottlenecks	Simple functions, known issues	Time per line, call stack
`bench::mark()`	Comparing alternatives	Single approach	Relative performance, memory
`system.time()`	Quick checks	Detailed analysis	Total runtime only
`Rprof()`	Base R only environments	When profvis available	Raw profiling data

Performance Workflow

•Profile first - Find the actual bottlenecks
•Focus on the slowest parts - 80/20 rule
•Benchmark alternatives - For hot spots only
•Consider tool trade-offs - Based on bottleneck type

See profiling-workflow.md for the complete workflow.

When Each Tool Helps vs Hurts

Parallel Processing (`in_parallel()`)

Helps when:

•CPU-intensive computations
•Embarrassingly parallel problems
•Large datasets with independent operations
•I/O bound operations (file reading, API calls)

Hurts when:

•Simple, fast operations (overhead > benefit)
•Memory-intensive operations (may cause thrashing)
•Operations requiring shared state
•Small datasets

See parallel-examples.md for decision points.

Data Backend Selection

Backend	Use When
data.table	Very large datasets (>1GB), complex grouping, maximum performance critical
dplyr	Readability priority, complex joins/window functions, moderate data (<100MB)
base R	No dependencies allowed, simple operations, teaching/learning

See backend-selection.md for guidance.

Profiling Best Practices

•Profile realistic data sizes - Not toy examples
•Profile multiple runs - For stability
•Check memory usage too - Not just time
•Profile realistic usage patterns - Not isolated calls

See profiling-best-practices.md for examples.

Performance Anti-Patterns to Avoid

•Don't optimize without measuring - Profile first
•Don't over-engineer - Complex optimizations for 1% gains
•Don't assume - "for loops are always slow" is a myth
•Don't ignore readability costs - Readable code with targeted optimizations

See performance-anti-patterns.md for examples.

Modern purrr Patterns

Data Frame Binding (purrr 1.0+)

Superseded	Modern Replacement
`map_dfr(x, f)`	`map(x, f) \|> list_rbind()`
`map_dfc(x, f)`	`map(x, f) \|> list_cbind()`
`map2_dfr(x, y, f)`	`map2(x, y, f) \|> list_rbind()`

Side Effects with `walk()`

Use walk() and walk2() for side effects (file writing, plotting).

Parallel Processing (purrr 1.1.0+)

Use in_parallel() with mirai for scaling across cores.

See purrr-patterns.md for all patterns.

Backend Tools for Performance

When speed is critical, consider:

•vctrs - Type-stable vector operations
•rlang - Metaprogramming
•data.table - Large data operations

Profile to identify whether these tools will help your specific bottleneck.

source: Sarah Johnson's gist https://gist.github.com/sj-io/3828d64d0969f2a0f05297e59e6c15ad