R Code Patterns for Packages
Overview
Writing code for R packages differs significantly from writing R scripts. This skill covers the critical rules, forbidden functions, and best practices that prevent common package development errors.
CRITICAL: Forbidden Functions in R/
These functions should NEVER appear in your package's R/ code:
NEVER Use library() or require()
Why: These modify the user's search path and create dependencies on load order.
# WRONG - DO NOT DO THIS:
my_function <- function(x) {
library(dplyr)
x %>% filter(value > 0)
}
# RIGHT - Use explicit namespace:
my_function <- function(x) {
dplyr::filter(x, value > 0)
}
# ALSO RIGHT - Import in NAMESPACE via roxygen2:
#' @importFrom dplyr filter
my_function <- function(x) {
filter(x, value > 0)
}
Exception: require() is acceptable for Suggests packages, but use requireNamespace() instead:
# ACCEPTABLE but not ideal:
if (require("ggplot2")) {
# code using ggplot2
}
# BETTER:
if (requireNamespace("ggplot2", quietly = TRUE)) {
# code using ggplot2
}
# BEST - with helpful error message:
check_suggested <- function(pkg) {
if (!requireNamespace(pkg, quietly = TRUE)) {
stop(
"Package '", pkg, "' required but not installed.\n",
"Install with: install.packages('", pkg, "')",
call. = FALSE
)
}
}
NEVER Use source()
Why: Package code is loaded via the namespace mechanism, not sourcing.
# WRONG:
source("R/utils.R")
# RIGHT:
# Just define functions in R/ files. They're automatically loaded.
# If you need execution order, use alphabetical naming or .onLoad()
NEVER Use ::: (Triple Colon)
Why: Accessing internal functions from other packages is fragile and violates encapsulation.
# WRONG: result <- dplyr:::some_internal_function(x) # RIGHT: # Either: # 1. The other package should export it # 2. Copy the code (with attribution) if license permits # 3. Use a different approach # 4. Ask the maintainer to export it
Exception: Acceptable in tests to test your own internal functions:
# In tests/testthat/test-internal.R:
test_that("internal function works", {
expect_equal(mypackage:::.internal_function(5), 10)
})
Namespace Management
Default Pattern: pkg::fun()
For packages listed in Imports:, use explicit namespace references:
#' @param x A data frame
my_function <- function(x) {
# Explicit namespace - always works
dplyr::mutate(x, new_col = value * 2)
}
Advantages:
- •Clear where functions come from
- •No NAMESPACE imports needed for simple usage
- •Easier to understand code provenance
Import Pattern: @importFrom
For heavily-used functions, import to your namespace:
#' @importFrom dplyr filter mutate select
#' @importFrom rlang .data
my_function <- function(x) {
x %>%
filter(.data$value > 0) %>%
mutate(doubled = .data$value * 2) %>%
select(.data$id, .data$doubled)
}
When to import:
- •Functions used in many places (reduces typing)
- •Infix operators (%>%, %||%, :=)
- •Functions from your package's core dependencies
- •Performance-critical code (tiny speedup from avoiding :::)
Full Import: @import (Rare)
Import an entire package namespace (almost never recommended):
#' @import rlang
Only use for:
- •rlang (if building a tidyverse-style package)
- •Your own package's internal utilities package
Why avoid:
- •Imports ALL exported functions (namespace pollution)
- •Risk of function name conflicts
- •Makes code provenance unclear
- •Can break if imported package adds conflicting exports
R Code Restrictions
Use TRUE/FALSE, Never T/F
Why: T and F are variables that can be reassigned; TRUE and FALSE are reserved words.
# WRONG:
if (x == T) {
return(F)
}
# RIGHT:
if (x == TRUE) {
return(FALSE)
}
# Why it matters:
T <- FALSE # User can do this!
x == T # Now broken
ASCII-Only in .R Files
Why: R CMD check enforces ASCII in package R code for portability.
# WRONG:
message("Processing data → results")
city <- "São Paulo"
# RIGHT:
message("Processing data -> results") # ASCII arrow
city <- "S\u00e3o Paulo" # Unicode escape
# For documentation (roxygen2 comments), UTF-8 is fine:
#' @param city City name (e.g., "São Paulo")
Handling Unicode:
# Use Unicode escapes:
"\u00e9" # é
"\u2192" # →
"\u2264" # ≤
# Or use iconv():
iconv("São Paulo", to = "ASCII//TRANSLIT")
Global State and Options
CRITICAL: Never modify global state without restoration.
# WRONG - leaves user's session modified:
my_function <- function() {
options(stringsAsFactors = FALSE)
# ... code ...
}
# RIGHT - restore on exit:
my_function <- function() {
old_opts <- options(stringsAsFactors = FALSE)
on.exit(options(old_opts), add = TRUE)
# ... code ...
}
# BETTER - use withr:
my_function <- function() {
withr::local_options(stringsAsFactors = FALSE)
# ... code ...
}
Common global state to be careful with:
- •
options() - •
par()(graphics parameters) - •Working directory (
setwd()) - •Random seed (
set.seed()) - •Environment variables
# Always restore:
plot_something <- function() {
old_par <- par(mfrow = c(2, 2))
on.exit(par(old_par), add = TRUE)
# ... plotting code ...
}
# Or use withr:
plot_something <- function() {
withr::local_par(mfrow = c(2, 2))
# ... plotting code ...
}
Build-Time vs Load-Time Code
Understanding Execution Timing
CRITICAL: Top-level code in R/ files executes during R CMD build, not when users load your package!
# This runs at BUILD time (on your machine):
message("Building package") # Users never see this!
current_year <- as.integer(format(Sys.Date(), "%Y")) # Frozen at build time!
# This runs at LOAD time (on user's machine):
.onLoad <- function(libname, pkgname) {
message("Loading package") # Users see this
}
my_function <- function() {
year <- as.integer(format(Sys.Date(), "%Y")) # Computed each call
# ...
}
What Belongs Where
# TOP-LEVEL (Build time):
# - Function definitions (always)
# - Package constants
# - Nothing that queries the system!
# Examples:
PACKAGE_VERSION <- "1.0.0" # OK - constant
API_ENDPOINT <- "https://api.example.com" # OK - constant
# WRONG - these run at build time:
USER_HOME <- Sys.getenv("HOME") # Gets YOUR home, not user's!
N_CORES <- parallel::detectCores() # Gets YOUR cores!
IS_WINDOWS <- .Platform$OS.type == "windows" # Gets YOUR OS!
# RIGHT - query at runtime:
get_user_home <- function() {
Sys.getenv("HOME")
}
get_n_cores <- function() {
parallel::detectCores()
}
is_windows <- function() {
.Platform$OS.type == "windows"
}
Package State Management
Using Environments for State
Packages should not use global variables. Use environments instead:
# Create package environment (at build time):
pkg_env <- new.env(parent = emptyenv())
# Initialize state (at load time):
.onLoad <- function(libname, pkgname) {
pkg_env$cache <- list()
pkg_env$config <- list(
api_key = NULL,
verbose = FALSE
)
}
# Access in functions:
get_config <- function(key) {
pkg_env$config[[key]]
}
set_config <- function(key, value) {
pkg_env$config[[key]] <- value
invisible(value)
}
cache_get <- function(key) {
pkg_env$cache[[key]]
}
cache_set <- function(key, value) {
pkg_env$cache[[key]] <- value
invisible(value)
}
Why parent = emptyenv():
- •Prevents accidental variable lookup in global environment
- •Keeps package environment isolated
- •Makes dependencies explicit
.onLoad() and .onAttach()
Place these in R/zzz.R (alphabetically last, so other code is already sourced):
# R/zzz.R
.onLoad <- function(libname, pkgname) {
# Run when namespace is loaded
# Setup package state, register S3 methods, etc.
# NO messages to user here!
# Initialize package environment:
pkg_env$cache <- new.env(parent = emptyenv())
# Register S3 methods dynamically:
# s3_register("dplyr::dplyr_reconstruct", "my_class")
# Set options (with package prefix):
op <- options()
op_mypackage <- list(
mypackage.verbose = FALSE,
mypackage.api_url = "https://api.example.com"
)
toset <- !(names(op_mypackage) %in% names(op))
if (any(toset)) options(op_mypackage[toset])
invisible()
}
.onAttach <- function(libname, pkgname) {
# Run when package is attached (library() call)
# OK to show startup messages here
packageStartupMessage(
"Welcome to mypackage version ",
utils::packageVersion("mypackage")
)
# Check for API key:
if (is.null(get_api_key())) {
packageStartupMessage(
"No API key found. Set one with set_api_key()"
)
}
invisible()
}
.onUnload <- function(libpath) {
# Cleanup when package is unloaded
# Close connections, free resources, etc.
library.dynam.unload("mypackage", libpath)
}
Key differences:
- •
.onLoad(): Always runs (even withloadNamespace()), no messages - •
.onAttach(): Only withlibrary(), messages OK - •Use
.onLoad()for setup,.onAttach()for user communication
File Organization
Anti-Pattern: One Function Per File
Problem: Creates too many small files, hard to navigate.
# AVOID (unless functions are very large): R/ ├── add.R ├── subtract.R ├── multiply.R ├── divide.R ├── mean.R ├── median.R └── mode.R
Anti-Pattern: All Code in One File
Problem: Creates one huge file, hard to maintain.
# AVOID: R/ └── functions.R # 5000 lines!
Good Pattern: Logical Grouping
R/ ├── mypackage-package.R # Package docs and imports ├── arithmetic.R # add(), subtract(), multiply(), divide() ├── statistics.R # mean(), median(), mode(), sd() ├── data.R # Data documentation ├── utils.R # Internal utilities └── zzz.R # .onLoad, .onAttach
Grouping strategies:
- •By feature/domain
- •By class (if using S3/R6)
- •By workflow step
- •utils.R for shared internal functions
File Naming Conventions
R/ ├── aaa-imports.R # Loaded first (package-level imports) ├── class-model.R # Class definition ├── model-methods.R # Methods for model class ├── model-utils.R # Utilities for model class ├── generics.R # Generic function definitions ├── import-standalone-*.R # Standalone utilities ├── utils.R # General utilities ├── utils-assertions.R # Assertion utilities ├── data.R # Data documentation ├── mypackage-package.R # Package documentation └── zzz.R # .onLoad, .onAttach (loaded last)
Handling Non-Standard Evaluation
data.table and dplyr Column References
Problem: R CMD check warns about "no visible binding for global variable".
# This triggers warnings:
my_function <- function(data) {
data %>%
dplyr::filter(value > 0) %>%
dplyr::mutate(doubled = value * 2)
}
# Note: value, doubled are column names, not variables
Solution 1: Use .data pronoun (tidyverse)
#' @importFrom rlang .data
my_function <- function(data) {
data %>%
dplyr::filter(.data$value > 0) %>%
dplyr::mutate(doubled = .data$value * 2)
}
Solution 2: Use globalVariables()
# In R/globals.R or top of file:
utils::globalVariables(c("value", "doubled"))
my_function <- function(data) {
data %>%
dplyr::filter(value > 0) %>%
dplyr::mutate(doubled = value * 2)
}
Solution 3: Use .SD (data.table)
my_function <- function(dt) {
# Instead of:
# dt[, .(doubled = value * 2)]
# Use .SD:
dt[, .(doubled = .SD$value * 2)]
}
Declaring Global Variables
# R/globals.R utils::globalVariables(c( # data.table/dplyr column references: "value", "doubled", "id", "name", # Other NSE variables: ".", # data.table's . placeholder "where" # tidyselect ))
Performance Considerations
Vectorization
# SLOW:
sum_slow <- function(x) {
total <- 0
for (i in seq_along(x)) {
total <- total + x[i]
}
total
}
# FAST:
sum_fast <- function(x) {
sum(x)
}
Pre-allocation
# SLOW - grows vector:
result <- c()
for (i in 1:10000) {
result <- c(result, i^2)
}
# FAST - pre-allocated:
result <- numeric(10000)
for (i in 1:10000) {
result[i] <- i^2
}
# FASTEST - vectorized:
result <- (1:10000)^2
Avoid Repeated Function Calls
# SLOW:
for (i in 1:length(x)) { # length() called each iteration
# ...
}
# FAST:
n <- length(x)
for (i in 1:n) {
# ...
}
# BETTER - seq_along handles empty vectors:
for (i in seq_along(x)) {
# ...
}
Common Pitfalls
1. Using library() in Package Code
Problem: Most common mistake by newcomers.
# WRONG:
my_plot <- function(data) {
library(ggplot2) # NO!
ggplot(data, aes(x, y)) + geom_point()
}
# RIGHT:
#' @import ggplot2
my_plot <- function(data) {
ggplot(data, aes(x, y)) + geom_point()
}
2. Forgetting to Restore Options
Problem: User's session left in modified state.
# WRONG:
my_function <- function() {
options(warn = -1)
# risky operation
}
# RIGHT:
my_function <- function() {
withr::local_options(warn = -1)
# risky operation
}
3. System Queries at Top Level
Problem: Captures build-time values, not runtime.
# WRONG:
N_CORES <- parallel::detectCores() # Build time!
my_parallel <- function() {
cl <- parallel::makeCluster(N_CORES) # User gets YOUR core count!
# ...
}
# RIGHT:
my_parallel <- function(n_cores = parallel::detectCores()) {
cl <- parallel::makeCluster(n_cores)
# ...
}
4. Using T/F Instead of TRUE/FALSE
# WRONG:
if (verbose == T) { # Can break!
message("Details...")
}
# RIGHT:
if (verbose == TRUE) {
message("Details...")
}
# BETTER - isTRUE() is safer:
if (isTRUE(verbose)) {
message("Details...")
}
5. Not Using parent = emptyenv()
Problem: Package environment can accidentally access globals.
# WRONG: cache <- new.env() # parent = parent.frame() # RIGHT: cache <- new.env(parent = emptyenv())
6. Direct UTF-8 in Code
Problem: R CMD check fails on ASCII-enforcing systems.
# WRONG:
cities <- c("São Paulo", "Zürich")
# RIGHT:
cities <- c("S\u00e3o Paulo", "Z\u00fcrich")
7. Using ::: for Other Packages
# WRONG: result <- otherpkg:::internal_function(x) # RIGHT: # Contact maintainer to export it, or find alternative
8. Namespace Pollution with @import
# WRONG (unless rlang): #' @import dplyr #' @import tidyr #' @import stringr # RIGHT: #' @importFrom dplyr filter mutate select #' @importFrom tidyr pivot_longer #' @importFrom stringr str_detect str_replace
9. Missing globalVariables() Declaration
Problem: R CMD check warnings about NSE variables.
# Triggers warnings:
summarize_data <- function(df) {
df %>%
group_by(category) %>%
summarize(mean_value = mean(value))
}
# Fix:
utils::globalVariables(c("category", "value", "mean_value"))
10. Using source() for Code Organization
# WRONG:
# In R/main.R:
source("R/utils.R")
# RIGHT:
# Just define functions in both files
# R/ files are all loaded automatically
Quick Reference
Package Code Checklist
# ✓ DO: - Use pkg::fun() for Imports - Use requireNamespace() for Suggests - Use TRUE/FALSE (never T/F) - Use ASCII in .R files (\uXXXX for Unicode) - Restore global state with on.exit() or withr - Use new.env(parent = emptyenv()) for state - Put .onLoad() in R/zzz.R - Use utils::globalVariables() for NSE # ✗ DON'T: - library() or require() (except requireNamespace() for Suggests) - source() - ::: to access other packages' internals - T/F instead of TRUE/FALSE - UTF-8 directly in .R files - Modify options/par without restoration - System queries at top level - One function per file (usually) - All code in one file
Template: R/zzz.R
# Package environment
pkg_env <- new.env(parent = emptyenv())
.onLoad <- function(libname, pkgname) {
# Initialize package state
pkg_env$cache <- new.env(parent = emptyenv())
# Set package options
op <- options()
op_mypackage <- list(
mypackage.verbose = FALSE
)
toset <- !(names(op_mypackage) %in% names(op))
if (any(toset)) options(op_mypackage[toset])
invisible()
}
.onAttach <- function(libname, pkgname) {
packageStartupMessage("Welcome to mypackage")
invisible()
}