AgentSkillsCN

r-targets

使用 targets 包在 R 中构建并维护可复现的数据分析管道。当您处理 _targets.R 文件、创建计算工作流、管理 R 分析步骤之间的依赖关系、调试管道错误、优化性能,或为大规模分析实施动态/静态分支时,请使用此技能。

SKILL.md
--- frontmatter
name: r-targets
description: Build and maintain reproducible data analysis pipelines in R using the targets package. Use when working with _targets.R files, creating computational workflows, managing dependencies between R analysis steps, debugging pipeline errors, optimizing performance, or implementing dynamic/static branching for large-scale analyses.

R Targets Package Skill

This skill helps you effectively use the targets R package for building reproducible, scalable data analysis pipelines.

Core Concepts

What is targets?

targets is a Make-like pipeline tool for R that:

  • Skips costly runtime for tasks already up to date
  • Orchestrates computation with implicit parallel computing
  • Abstracts files as R objects
  • Tracks dependencies automatically through static code analysis

Key Files

  • _targets.R: The target script file that defines your pipeline. Must return a list of target objects.
  • _targets/: Data store containing:
    • _targets/meta/meta: Target metadata (text file)
    • _targets/objects/: Target output data
    • _targets/workspaces/: Debug workspaces for errored targets

Quick Start

Basic Pipeline Structure

r
# _targets.R
library(targets)
library(tarchetypes)

tar_source()  # Sources R/ directory
tar_option_set(packages = c("dplyr", "ggplot2"))

list(
  tar_target(file, "data.csv", format = "file"),
  tar_target(data, read_csv(file)),
  tar_target(model, fit_model(data)),
  tar_target(plot, create_plot(model, data))
)

Essential Commands

r
# Run the pipeline
tar_make()

# Check what would run
tar_outdated()

# Visualize dependencies
tar_visnetwork()

# List all targets with commands
tar_manifest()

# Read target results
tar_read(target_name)
tar_load(target_name)  # Loads into environment

# Clean up
tar_destroy()  # Remove entire _targets/ directory
tar_delete(target_name)  # Delete specific targets
tar_invalidate(target_name)  # Remove metadata only

Best Practices

Target Design

A good target should:

  1. Create a dataset, analyze a dataset, or summarize an analysis
  2. Be large enough to save meaningful time when skipped
  3. Be small enough that some targets can skip while others run
  4. Have no side effects (except file targets with format = "file")
  5. Return a single, meaningful, saveable value

Function-Oriented Workflows

Define functions in R/ directory, not inline in _targets.R:

r
# R/functions.R
get_data <- function(file) {
  read_csv(file) %>%
    filter(!is.na(value))
}

fit_model <- function(data) {
  lm(outcome ~ predictor, data)
}
r
# _targets.R
library(targets)
tar_source()

list(
  tar_target(data, get_data("data.csv")),
  tar_target(model, fit_model(data))
)

Storage Formats

Choose appropriate formats for your data:

FormatBest ForRequirements
"rds" (default)General R objectsbase R
"qs"Large/general objectsqs2 package
"feather"Data framesarrow package
"parquet"Large data framesarrow package
"file"External filesReturns file path
r
tar_option_set(format = "qs")  # Global setting
# OR
tar_target(data, get_data(), format = "qs")  # Per-target

Dynamic Branching

Dynamic branching creates targets at runtime based on data:

r
list(
  tar_target(samples, c("A", "B", "C")),
  tar_target(
    analysis,
    analyze_sample(samples),
    pattern = map(samples)  # Creates 3 branches
  ),
  tar_target(
    combined,
    combine_results(analysis)  # Auto-aggregates branches
  )
)

Pattern Types

  • map(x, y): One branch per tuple of elements
  • cross(x, y): One branch per combination
  • slice(x, index = c(1, 3)): Branch over specific indices
  • head(x, n = 5): First n elements
  • tail(x, n = 5): Last n elements
  • sample(x, n = 5): Random sample

Iteration Modes

  • "vector" (default): Uses vctrs::vec_slice() and vctrs::vec_c()
  • "list": Uses [[ for slicing and list() for aggregation
  • "group": Branch over dplyr::group_by() row groups (use with tar_group())

Static Branching with tarchetypes

Static branching creates targets before the pipeline runs using metaprogramming:

r
library(tarchetypes)

values <- tibble(
  method = rlang::syms(c("method1", "method2")),
  dataset = c("data1", "data2")
)

tar_map(
  values = values,
  tar_target(analysis, method(dataset)),
  tar_target(summary, summarize(analysis))
)

Debugging Workflow

Step 1: Check Error Details

r
tar_meta(fields = error, complete_only = TRUE)

Step 2: Reproduce Error Locally

r
tar_load_globals()  # Load functions and packages
tar_load(target_name)  # Load dependencies
# Run the errored function

Step 3: Interactive Debugging (if needed)

r
# Add browser() to your function
tar_make(callr_function = NULL, use_crew = FALSE)

See references/TROUBLESHOOTING.md for detailed error solutions.

Advanced Topics (See References)

Useful Utilities

r
# Check dependencies
tar_deps(your_function)

# Test branching patterns
tar_pattern(map(x, y), x = 3, y = 2)

# Get target metadata
tar_meta(targets_only = TRUE)
tar_meta(fields = c("name", "status", "time", "error"))

# Monitor progress
tar_poll()           # Continuous refresh
tar_progress()       # Current status
tar_watch()          # Shiny app

# Validate pipeline
tar_validate()       # Check for errors
tar_glimpse()        # Brief summary