AgentSkillsCN

R Package Structure

深入理解 R 包的目录布局、包状态、必选与可选文件,以及构建配置模式。

SKILL.md
--- frontmatter
name: R Package Structure
description: Understanding R package directory layout, package states, required and optional files, and build configuration patterns

R Package Structure

Overview

This skill covers the fundamental structure of R packages, from directory layout to the five distinct package states. Understanding package structure is critical for proper package development.

The Five Package States

R packages exist in five different states throughout their lifecycle:

  1. Source: The development state - what you work on. A directory with DESCRIPTION, R/, etc.
  2. Bundled: A compressed .tar.gz file created by R CMD build. Single file for distribution.
  3. Binary: Platform-specific compiled package (.tgz on macOS, .zip on Windows). No source code.
  4. Installed: Decompressed into a library directory. What library() loads from.
  5. In-memory: Loaded into R's namespace system via library() or loadNamespace().
r
# State transitions:
# Source -> Bundled:     R CMD build / devtools::build()
# Bundled -> Binary:     R CMD INSTALL --build
# Source/Bundled -> Installed: R CMD INSTALL / install.packages()
# Installed -> In-memory: library() / loadNamespace()

Understanding these states helps you know:

  • What files belong in source but not in bundles (.Rbuildignore)
  • Why some files exist in installed packages but not source
  • When code executes (build time vs load time)

Required Files and Directories

DESCRIPTION

The package metadata file. Every package MUST have this.

dcf
Package: mypackage
Title: What the Package Does (One Line, Title Case)
Version: 0.1.0
Authors@R:
    person("First", "Last", , "email@example.com", role = c("aut", "cre"),
           comment = c(ORCID = "YOUR-ORCID-ID"))
Description: What the package does (one paragraph).
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
Imports:
    dplyr (>= 1.0.0),
    rlang (>= 1.0.0)
Suggests:
    testthat (>= 3.0.0),
    knitr,
    rmarkdown
Config/testthat/edition: 3

NAMESPACE

The namespace file. Controls what your package exports and imports.

CRITICAL: Never edit this file by hand! Use roxygen2 to generate it.

r
# Generated by roxygen2: do not edit by hand

export(my_function)
exportPattern("^[^\\.]")
importFrom(dplyr,filter)
importFrom(rlang,"%||%")

R/

Directory containing all your R code files. Required if your package has any functions.

Rules for R/ directory:

  • Only .R files (case-sensitive on some platforms)
  • Files sourced in alphabetical order during build
  • No subdirectories (all files in flat structure)
  • File names should be descriptive of content
  • Special file: R/zzz.R for .onLoad() and .onAttach()
code
R/
├── data.R          # Data documentation
├── import-standalone.R  # Standalone imported utilities
├── my-package-package.R  # Package-level documentation
├── utils.R         # Utility functions
├── main-feature.R  # Main functionality
└── zzz.R          # .onLoad and .onAttach hooks

man/

Directory containing documentation files (.Rd format). Required for packages with documentation.

CRITICAL: Never edit .Rd files by hand! Use roxygen2 to generate them.

code
man/
├── mypackage-package.Rd  # Package overview
├── my_function.Rd        # Function documentation
└── my_data.Rd           # Data documentation

Optional Directories

data/

Contains exported data objects (.rda or .RData files).

r
# Create with:
usethis::use_data(my_dataset, overwrite = TRUE)

Important details:

  • Binary .rda format only (use save() with compress = "xz" for best compression)
  • Accessed by users with data(my_dataset) or direct reference
  • Requires documentation in R/data.R
  • LazyData: true in DESCRIPTION means data loads without data() call
  • CRAN limit: <5MB total, <1MB per subdirectory

tests/

Contains all package tests.

code
tests/
├── testthat/
│   ├── helper-data.R      # Test helpers (loaded before tests)
│   ├── setup.R            # Setup run before tests
│   ├── test-feature1.R    # Test files (must start with "test-")
│   └── test-feature2.R
└── testthat.R             # Entry point (loads testthat and runs tests)

Setup with:

r
usethis::use_testthat(3)  # 3rd edition

vignettes/

Long-form documentation and tutorials.

code
vignettes/
├── articles/              # Articles (not installed with package)
│   └── supplementary.Rmd
└── introduction.Rmd       # Vignettes (installed with package)
r
# Create with:
usethis::use_vignette("introduction")
usethis::use_article("supplementary")

Key differences:

  • Vignettes: Installed with package, in CRAN bundle
  • Articles: pkgdown only, not in bundle (saves size)

inst/

Files to be installed as-is. Only inst/ contents are copied to installation.

code
inst/
├── CITATION              # How to cite the package
├── extdata/             # Example/raw data files
│   ├── example.csv
│   └── sample.json
├── scripts/             # Helper scripts
│   └── setup.R
└── templates/           # Template files
    └── report.Rmd

Access inst/ files with:

r
system.file("extdata", "example.csv", package = "mypackage")

CRITICAL: Do NOT put a DESCRIPTION or NAMESPACE file in inst/ - this will break your package!

data-raw/

Scripts for creating package data objects. Not included in bundle.

code
data-raw/
├── DATASET.R            # Script to create data/DATASET.rda
└── prepare_examples.R   # Script for inst/extdata/ files
r
# Setup with:
usethis::use_data_raw("DATASET")

Pattern:

r
# data-raw/DATASET.R
library(tidyverse)

DATASET <- read_csv("source.csv") %>%
  clean_names() %>%
  filter(year >= 2020)

usethis::use_data(DATASET, overwrite = TRUE)

src/

Compiled code (C, C++, Fortran).

code
src/
├── Makevars             # Unix build configuration
├── Makevars.win         # Windows build configuration
├── mycode.cpp           # Source files
└── RcppExports.cpp      # Auto-generated (Rcpp)

Build Control Files

.Rbuildignore

Files/directories to exclude from package bundle. Uses regex patterns (NOT glob patterns).

code
^.*\.Rproj$              # RStudio project files
^\.Rproj\.user$          # RStudio user files
^data-raw$               # Data preparation scripts
^LICENSE\.md$            # Full license (keep LICENSE)
^README\.Rmd$            # Source (keep README.md)
^\.github$               # GitHub-specific files
^_pkgdown\.yml$          # pkgdown config
^docs$                   # pkgdown output
^pkgdown$                # pkgdown extras
^\.httr-oauth$           # OAuth credentials
^\.secrets$              # Secrets directory
^\.env$                  # Environment files
^cran-comments\.md$      # CRAN submission notes
^revdep$                 # Reverse dependency checks
^\.lintr$                # Linter configuration
^\.pre-commit-config\.yaml$  # Pre-commit hooks

Pattern rules:

  • Regex, not glob: use ^ for start, $ for end
  • Escape dots: \. not .
  • Case-sensitive
  • Test patterns with usethis::use_build_ignore("pattern")

.gitignore

Files to exclude from version control.

code
# R specific
.Rproj.user
.Rhistory
.RData
.Ruserdata

# Build artifacts
/*.tar.gz
/*.zip
/check/
/revdep/

# Documentation
/docs/
/Meta/
/doc/

# Package specific
.httr-oauth
.secrets/
.env

# OS specific
.DS_Store
Thumbs.db

.github/

GitHub-specific files (excluded from bundle via .Rbuildignore).

code
.github/
├── workflows/
│   ├── R-CMD-check.yaml    # CI checks
│   ├── test-coverage.yaml  # Code coverage
│   └── pkgdown.yaml        # Deploy docs
├── CONTRIBUTING.md
├── ISSUE_TEMPLATE/
└── PULL_REQUEST_TEMPLATE.md

What Goes Where?

inst/ vs Root Directory

Common confusion: LICENSE, README, NEWS files

code
# Correct structure:
LICENSE              # Machine-readable (CRAN requires this at root)
LICENSE.md           # Human-readable (at root, in .Rbuildignore)
inst/CITATION        # Citation info (needs to be installed)

README.md            # User-facing (at root, included in bundle)
README.Rmd           # Source (at root, in .Rbuildignore)

NEWS.md              # At root (automatically used by pkgdown)

Internal vs External Data

code
data/                # Exported data (users can load)
  └── dataset.rda

R/sysdata.rda        # Internal data (your functions use, users cannot load)

inst/extdata/        # Raw data files (users access via system.file())
  └── example.csv

Package-level Files

code
mypackage/
├── DESCRIPTION         # Package metadata (required)
├── NAMESPACE          # Auto-generated by roxygen2 (required)
├── LICENSE            # License file (required for most licenses)
├── README.md          # Package overview (highly recommended)
├── NEWS.md            # Change log (recommended)
├── .Rbuildignore      # Build exclusions
├── .gitignore         # Git exclusions
├── mypackage.Rproj    # RStudio project (in .Rbuildignore)
├── R/                 # R code (required)
├── man/               # Documentation (auto-generated)
├── tests/             # Tests (highly recommended)
├── vignettes/         # Long-form docs (recommended)
├── data/              # Data (if needed)
├── data-raw/          # Data preparation (in .Rbuildignore)
├── inst/              # Installed files (if needed)
├── src/               # Compiled code (if needed)
└── .github/           # GitHub files (in .Rbuildignore)

File Organization Patterns

Small Package (<10 functions)

code
R/
├── mypackage-package.R  # Package docs
├── main.R              # Main functions
├── utils.R             # Utilities
└── zzz.R              # .onLoad if needed

Medium Package (10-50 functions)

code
R/
├── mypackage-package.R
├── feature1.R          # Grouped by feature
├── feature2.R
├── feature3.R
├── utils.R
├── utils-feature1.R    # Feature-specific utils
├── data.R
└── zzz.R

Large Package (50+ functions)

code
R/
├── mypackage-package.R
├── aaa-imports.R       # Package-level imports (aaa = loaded first)
├── class-feature1.R    # S3/R6 class definitions
├── feature1-methods.R  # Methods for feature1
├── feature1-utils.R    # Utilities for feature1
├── feature2-core.R
├── feature2-helpers.R
├── generics.R          # Generic function definitions
├── import-standalone-*.R  # Standalone imports
├── utils.R
├── data.R
└── zzz.R

Common Pitfalls

1. Editing NAMESPACE or .Rd Files Manually

Problem: These are auto-generated by roxygen2.

Solution: Always use roxygen2 comments in R files.

r
# WRONG: Editing man/my_function.Rd directly
# RIGHT: Add roxygen2 comments in R/my_function.R and run devtools::document()

#' My function title
#'
#' @param x Input data
#' @returns Processed output
#' @export
my_function <- function(x) {
  # implementation
}

2. Using Subdirectories in R/

Problem: R/ does not support subdirectories.

Solution: Use file naming conventions instead.

code
# WRONG:
R/
└── feature1/
    ├── core.R
    └── utils.R

# RIGHT:
R/
├── feature1-core.R
└── feature1-utils.R

3. Forgetting .Rbuildignore for Development Files

Problem: Development files included in package bundle, inflating size.

Solution: Add patterns to .Rbuildignore.

r
usethis::use_build_ignore("data-raw")
usethis::use_build_ignore(".github")

4. Putting Data in inst/ Instead of data/

Problem: Data in inst/extdata/ is for raw files, not R objects.

Solution: Use correct location for data type.

r
# For R data objects (users can load):
usethis::use_data(my_dataset)  # Creates data/my_dataset.rda

# For raw files (users read):
# Put in inst/extdata/example.csv
# Access with system.file("extdata", "example.csv", package = "pkg")

5. Wrong File Extension Case

Problem: script.r instead of script.R (matters on Linux).

Solution: Always use uppercase .R for R code files.

code
# WRONG: R/utils.r
# RIGHT: R/utils.R

6. Exceeding CRAN Size Limits

Problem: Package too large for CRAN (<5MB total, <1MB per subdirectory).

Solution: Compress data, move examples to inst/extdata/, suggest large dependencies.

r
# Compress data maximally:
usethis::use_data(dataset, compress = "xz", overwrite = TRUE)

# Check package size:
devtools::build()  # Look at .tar.gz size

7. Including Credentials or Secrets

Problem: API keys, OAuth tokens in package code.

Solution: Use .Rbuildignore and .gitignore, document user-side credential management.

code
# .gitignore and .Rbuildignore:
.httr-oauth
.secrets
.env
inst/secrets/

8. Incorrect inst/ Usage

Problem: Misunderstanding what inst/ is for.

Solution: Remember: inst/ contents are copied to package root at installation.

code
# During development:
inst/extdata/file.csv

# After installation:
extdata/file.csv  # (inst/ is stripped)

# Access:
system.file("extdata", "file.csv", package = "mypackage")

9. Missing LazyData Declaration

Problem: Data in data/ but users must call data() to load it.

Solution: Add to DESCRIPTION.

dcf
LazyData: true

10. Mixing up README Files

Problem: Having both README.md and README.Rmd without .Rbuildignore.

Solution: Keep only README.md in bundle.

code
# Correct setup:
README.Rmd           # Source (in .Rbuildignore)
README.md            # Generated (included in bundle)

# .Rbuildignore:
^README\.Rmd$

Quick Reference

Creating New Package

r
# Create package structure:
usethis::create_package("~/mypackage")

# Setup development tools:
usethis::use_git()
usethis::use_github()
usethis::use_testthat(3)
usethis::use_mit_license()
usethis::use_roxygen_md()

# Configure build:
usethis::use_build_ignore(c("data-raw", ".github"))

Common usethis Helpers

r
usethis::use_data(dataset)              # Add exported data
usethis::use_data_raw("dataset")        # Create data-raw/ script
usethis::use_r("function")              # Create R/function.R
usethis::use_test("function")           # Create test file
usethis::use_vignette("intro")          # Add vignette
usethis::use_package("dplyr")           # Add dependency
usethis::use_build_ignore("file")       # Add to .Rbuildignore

Checking Package Structure

r
# Check overall structure:
devtools::check()

# Build package:
devtools::build()

# Install locally:
devtools::install()

# Check what's in bundle:
pkgbuild::build(path = ".", dest_path = tempdir())
# Then examine .tar.gz contents

Resources