Set Up GxP R Project
Create an R project structure that meets GxP regulatory requirements for validated computing.
When to Use
- •Starting an R analysis project in a regulated environment (pharma, biotech, medical devices)
- •Setting up R for use in clinical trial analysis
- •Creating a validated computing environment for regulatory submissions
- •Implementing 21 CFR Part 11 or EU Annex 11 requirements
Inputs
- •Required: Project scope and regulatory framework (FDA, EMA, or both)
- •Required: R version and package versions to validate
- •Required: Validation strategy (risk-based approach)
- •Optional: Existing SOPs for computerized systems
- •Optional: Quality management system integration requirements
Procedure
Step 1: Create Validated Project Structure
code
gxp-project/ ├── R/ # Analysis scripts │ ├── 01_data_import.R │ ├── 02_data_processing.R │ └── 03_analysis.R ├── validation/ # Validation documentation │ ├── validation_plan.md # VP: scope, strategy, roles │ ├── risk_assessment.md # Risk categorization │ ├── iq/ # Installation Qualification │ │ ├── iq_protocol.md │ │ └── iq_report.md │ ├── oq/ # Operational Qualification │ │ ├── oq_protocol.md │ │ └── oq_report.md │ ├── pq/ # Performance Qualification │ │ ├── pq_protocol.md │ │ └── pq_report.md │ └── traceability_matrix.md # Requirements to tests mapping ├── tests/ # Automated test suite │ ├── testthat.R │ └── testthat/ │ ├── test-data_import.R │ └── test-analysis.R ├── data/ # Input data (controlled) │ ├── raw/ # Immutable raw data │ └── derived/ # Processed datasets ├── output/ # Analysis outputs ├── docs/ # Supporting documentation │ ├── sop_references.md # Links to relevant SOPs │ └── change_log.md # Manual change documentation ├── renv.lock # Locked dependencies ├── DESCRIPTION # Project metadata ├── .Rprofile # Session configuration └── CLAUDE.md # AI assistant instructions
Step 2: Create Validation Plan
Create validation/validation_plan.md:
markdown
# Validation Plan ## 1. Purpose This plan defines the validation strategy for [Project Name] using R [version]. ## 2. Scope - R version: 4.5.0 - Packages: [list with versions] - Analysis: [description] - Regulatory framework: 21 CFR Part 11 / EU Annex 11 ## 3. Risk Assessment Approach Using GAMP 5 risk-based categories: - Category 3: Non-configured products (R base) - Category 4: Configured products (R packages with default settings) - Category 5: Custom applications (custom R scripts) ## 4. Validation Activities | Activity | Category 3 | Category 4 | Category 5 | |----------|-----------|-----------|-----------| | IQ | Required | Required | Required | | OQ | Reduced | Standard | Enhanced | | PQ | N/A | Standard | Enhanced | ## 5. Roles and Responsibilities - Validation Lead: [Name] - Developer: [Name] - QA Reviewer: [Name] - Approver: [Name] ## 6. Acceptance Criteria All tests must pass with documented evidence.
Step 3: Lock Dependencies with renv
r
# Initialize renv with exact versions
renv::init()
# Install specific validated versions
renv::install("dplyr@1.1.4")
renv::install("ggplot2@3.5.0")
# Snapshot
renv::snapshot()
The renv.lock file serves as the controlled package inventory.
Step 4: Implement Version Control
bash
git init git add . git commit -m "Initial validated project structure" # Use signed commits for traceability git config user.signingkey YOUR_GPG_KEY git config commit.gpgsign true
Step 5: Create IQ Protocol
validation/iq/iq_protocol.md:
markdown
# Installation Qualification Protocol
## Objective
Verify that R and required packages are correctly installed.
## Test Cases
### IQ-001: R Version Verification
- **Requirement**: R 4.5.0 installed
- **Procedure**: Execute `R.version.string`
- **Expected**: "R version 4.5.0 (date)"
- **Result**: [ PASS / FAIL ]
### IQ-002: Package Installation Verification
- **Requirement**: All packages in renv.lock installed
- **Procedure**: Execute `renv::status()`
- **Expected**: "No issues found"
- **Result**: [ PASS / FAIL ]
### IQ-003: Package Version Verification
- **Procedure**: Execute `installed.packages()[, c("Package", "Version")]`
- **Expected**: Versions match renv.lock exactly
- **Result**: [ PASS / FAIL ]
Step 6: Write Automated OQ/PQ Tests
r
# tests/testthat/test-analysis.R
test_that("primary analysis produces validated results", {
# Known input -> known output (double programming validation)
test_data <- read.csv(test_path("fixtures", "validation_dataset.csv"))
result <- primary_analysis(test_data)
# Compare against independently calculated expected values
expect_equal(result$estimate, 2.345, tolerance = 1e-3)
expect_equal(result$p_value, 0.012, tolerance = 1e-3)
expect_equal(result$ci_lower, 1.234, tolerance = 1e-3)
})
Step 7: Create Traceability Matrix
markdown
# Traceability Matrix | Req ID | Requirement | Test ID | Test Description | Status | |--------|-------------|---------|------------------|--------| | REQ-001 | Import CSV data correctly | OQ-001 | Verify data dimensions and types | PASS | | REQ-002 | Calculate primary endpoint | PQ-001 | Compare against reference results | PASS | | REQ-003 | Generate report output | PQ-002 | Verify report contains all sections | PASS |
Validation
- • Project structure follows documented template
- • renv.lock contains all dependencies with exact versions
- • Validation plan is complete and approved
- • IQ protocol executes successfully
- • OQ test cases cover all configured functionality
- • PQ tests validate against independently computed results
- • Traceability matrix links requirements to tests
- • Change control process is documented
Common Pitfalls
- •Using
install.packages()without version pinning: Always use renv with locked versions - •Missing audit trail: Every change must be documented. Use git signed commits.
- •Over-validating: Apply risk-based approach. Not every CRAN package needs Category 5 validation.
- •Forgetting system-level qualification: The OS and R installation need IQ too
- •No independent verification: PQ should compare against results computed independently (SAS, manual calculation)
Related Skills
- •
write-validation-documentation- detailed validation document creation - •
implement-audit-trail- electronic records and audit trails - •
validate-statistical-output- double programming and output validation - •
manage-renv-dependencies- dependency locking for validated environments