R Targets Package Skill
This skill helps you effectively use the targets R package for building reproducible, scalable data analysis pipelines.
Core Concepts
What is targets?
targets is a Make-like pipeline tool for R that:
- •Skips costly runtime for tasks already up to date
- •Orchestrates computation with implicit parallel computing
- •Abstracts files as R objects
- •Tracks dependencies automatically through static code analysis
Key Files
- •
_targets.R: The target script file that defines your pipeline. Must return a list of target objects. - •
_targets/: Data store containing:- •
_targets/meta/meta: Target metadata (text file) - •
_targets/objects/: Target output data - •
_targets/workspaces/: Debug workspaces for errored targets
- •
Quick Start
Basic Pipeline Structure
r
# _targets.R
library(targets)
library(tarchetypes)
tar_source() # Sources R/ directory
tar_option_set(packages = c("dplyr", "ggplot2"))
list(
tar_target(file, "data.csv", format = "file"),
tar_target(data, read_csv(file)),
tar_target(model, fit_model(data)),
tar_target(plot, create_plot(model, data))
)
Essential Commands
r
# Run the pipeline tar_make() # Check what would run tar_outdated() # Visualize dependencies tar_visnetwork() # List all targets with commands tar_manifest() # Read target results tar_read(target_name) tar_load(target_name) # Loads into environment # Clean up tar_destroy() # Remove entire _targets/ directory tar_delete(target_name) # Delete specific targets tar_invalidate(target_name) # Remove metadata only
Best Practices
Target Design
A good target should:
- •Create a dataset, analyze a dataset, or summarize an analysis
- •Be large enough to save meaningful time when skipped
- •Be small enough that some targets can skip while others run
- •Have no side effects (except file targets with
format = "file") - •Return a single, meaningful, saveable value
Function-Oriented Workflows
Define functions in R/ directory, not inline in _targets.R:
r
# R/functions.R
get_data <- function(file) {
read_csv(file) %>%
filter(!is.na(value))
}
fit_model <- function(data) {
lm(outcome ~ predictor, data)
}
r
# _targets.R
library(targets)
tar_source()
list(
tar_target(data, get_data("data.csv")),
tar_target(model, fit_model(data))
)
Storage Formats
Choose appropriate formats for your data:
| Format | Best For | Requirements |
|---|---|---|
"rds" (default) | General R objects | base R |
"qs" | Large/general objects | qs2 package |
"feather" | Data frames | arrow package |
"parquet" | Large data frames | arrow package |
"file" | External files | Returns file path |
r
tar_option_set(format = "qs") # Global setting # OR tar_target(data, get_data(), format = "qs") # Per-target
Dynamic Branching
Dynamic branching creates targets at runtime based on data:
r
list(
tar_target(samples, c("A", "B", "C")),
tar_target(
analysis,
analyze_sample(samples),
pattern = map(samples) # Creates 3 branches
),
tar_target(
combined,
combine_results(analysis) # Auto-aggregates branches
)
)
Pattern Types
- •
map(x, y): One branch per tuple of elements - •
cross(x, y): One branch per combination - •
slice(x, index = c(1, 3)): Branch over specific indices - •
head(x, n = 5): First n elements - •
tail(x, n = 5): Last n elements - •
sample(x, n = 5): Random sample
Iteration Modes
- •
"vector"(default): Usesvctrs::vec_slice()andvctrs::vec_c() - •
"list": Uses[[for slicing andlist()for aggregation - •
"group": Branch overdplyr::group_by()row groups (use withtar_group())
Static Branching with tarchetypes
Static branching creates targets before the pipeline runs using metaprogramming:
r
library(tarchetypes)
values <- tibble(
method = rlang::syms(c("method1", "method2")),
dataset = c("data1", "data2")
)
tar_map(
values = values,
tar_target(analysis, method(dataset)),
tar_target(summary, summarize(analysis))
)
Debugging Workflow
Step 1: Check Error Details
r
tar_meta(fields = error, complete_only = TRUE)
Step 2: Reproduce Error Locally
r
tar_load_globals() # Load functions and packages tar_load(target_name) # Load dependencies # Run the errored function
Step 3: Interactive Debugging (if needed)
r
# Add browser() to your function tar_make(callr_function = NULL, use_crew = FALSE)
See references/TROUBLESHOOTING.md for detailed error solutions.
Advanced Topics (See References)
- •Troubleshooting: references/TROUBLESHOOTING.md - Solutions by error message
- •Patterns: references/PATTERNS.md - Common workflow recipes
- •Advanced Features: references/ADVANCED.md - Custom formats, CAS, metadata
- •HPC Integration: references/HPC_INTEGRATION.md - Parallel computing with crew
- •Package Development: references/PACKAGE_DEVELOPMENT.md - targets in R packages
- •Function Reference: references/FUNCTION_CATEGORIES.md - Organized API reference
- •Migration: references/MIGRATION.md - From drake to targets
Useful Utilities
r
# Check dependencies
tar_deps(your_function)
# Test branching patterns
tar_pattern(map(x, y), x = 3, y = 2)
# Get target metadata
tar_meta(targets_only = TRUE)
tar_meta(fields = c("name", "status", "time", "error"))
# Monitor progress
tar_poll() # Continuous refresh
tar_progress() # Current status
tar_watch() # Shiny app
# Validate pipeline
tar_validate() # Check for errors
tar_glimpse() # Brief summary