Group Sequential Methods

When to Use This Skill

•Designing group sequential trials with interim analyses
•Implementing alpha spending functions
•Setting futility stopping rules
•Calculating information fractions
•Using sim_gs_n() for GS simulations
•Integrating with gsDesign2 package

Fundamental Concepts

Group Sequential Design

A group sequential design allows for:

•Early stopping for efficacy: If treatment effect is larger than expected
•Early stopping for futility: If treatment effect is unlikely to reach significance
•Reduced expected sample size: When treatment effect is present

Information Fraction

Information fraction at analysis k:

code

I_k / I_K = (events at analysis k) / (total planned events)

For time-to-event trials, information ≈ number of events.

Type I Error Spending

The key constraint: Σ α_k ≤ α (overall Type I error)

Spending functions distribute alpha across analyses.

Alpha Spending Functions

O'Brien-Fleming (OBF)

Properties:

•Conservative at early analyses
•Nearly full alpha at final analysis
•Difficult to stop early
•Maintains nominal Type I error

Formula:

code

α*(t) = 2 - 2Φ(z_{α/2} / √t)

When to Use:

•Want maximum power at final analysis
•Early efficacy stopping unlikely
•Regulatory preference for conservative early bounds

Pocock

Properties:

•Equal spending at each analysis
•Easier to stop early
•Inflated final alpha
•Lower power at final analysis

Formula:

code

α*(t) = α × log(1 + (e-1)t)

When to Use:

•Early stopping is a priority
•Treatment effect expected to be large
•Willing to sacrifice final analysis power

Hwang-Shih-DeCani (HSD)

Properties:

•Flexible family indexed by γ
•γ = -4: Similar to OBF
•γ = 1: Similar to Pocock
•γ = 0: Linear (Pocock-like)

Formula:

code

α*(t) = α × (1 - e^{-γt}) / (1 - e^{-γ})

When to Use:

•Want flexibility between OBF and Pocock
•Customized spending pattern needed

Spending Function Comparison

Function	Early Spending	Final Power	Early Stopping
OBF	Low	High	Difficult
Pocock	High	Lower	Easier
HSD(γ=-4)	Low	High	Difficult
HSD(γ=1)	High	Lower	Easier

Futility Boundaries

Binding Futility

•If futility boundary crossed, trial MUST stop
•Affects Type I error calculation
•More powerful than non-binding

Non-Binding Futility

•Crossing futility boundary is advisory
•Trial can continue at investigator discretion
•Conservative: assumes no early stopping for futility in Type I error

Beta-Spending for Futility

Similar to alpha-spending, but for Type II error:

code

β*(t) = spending function × β

simtrial GS Implementation

create_cut() - Define Analysis Timing

# Interim Analysis 1
ia1_cut <- create_cut(
  planned_calendar_time = 20,        # Minimum 20 months
  target_event_overall = 100,        # Target 100 events
  max_extension_for_target_event = 24,  # Wait up to 24 months for events
  min_n_overall = 200,               # At least 200 enrolled
  min_followup = 12                  # 12 months minimum follow-up
)

# Interim Analysis 2
ia2_cut <- create_cut(
  planned_calendar_time = 32,
  target_event_overall = 200,
  max_extension_for_target_event = 34,
  min_time_after_previous_analysis = 10  # At least 10 months after IA1
)

# Final Analysis
fa_cut <- create_cut(
  planned_calendar_time = 45,
  target_event_overall = 350
)

sim_gs_n() - Run GS Simulations

library(simtrial)
library(gsDesign2)

# Define enrollment
enroll_rate <- define_enroll_rate(
  duration = c(4, 12),
  rate = c(10, 30)
)

# Define failure rates
fail_rate <- define_fail_rate(
  duration = c(3, 100),
  fail_rate = log(2)/9,
  hr = c(1, 0.6),
  dropout_rate = 0.001
)

# Run simulation
results <- sim_gs_n(
  n_sim = 1000,
  sample_size = 400,
  enroll_rate = enroll_rate,
  fail_rate = fail_rate,
  test = wlr,
  cut = list(ia1 = ia1_cut, ia2 = ia2_cut, fa = fa_cut),
  weight = fh(rho = 0, gamma = 0)
)

Integration with gsDesign2

library(gsDesign2)

# Design with gsDesign2
design <- gs_design_ahr(
  enroll_rate = define_enroll_rate(duration = c(4, 12), rate = c(10, 30)),
  fail_rate = define_fail_rate(
    duration = c(3, 100),
    fail_rate = log(2)/9,
    hr = c(1, 0.6),
    dropout_rate = 0.001
  ),
  alpha = 0.025,
  beta = 0.1,
  analysis_time = c(24, 36, 48),
  upper = gs_spending_bound,
  upar = list(sf = gsDesign::sfLDOF, total_spend = 0.025),
  lower = gs_spending_bound,
  lpar = list(sf = gsDesign::sfHSD, param = -4, total_spend = 0.1)
) |> to_integer()

# Simulate with design object
sim_results <- sim_gs_n(
  n_sim = 1000,
  sample_size = max(design$analysis$n),
  enroll_rate = design$enroll_rate,
  fail_rate = design$fail_rate,
  test = wlr,
  cut = NULL,  # Auto-generated from design
  original_design = design,
  weight = fh(rho = 0, gamma = 0)
)

Bound Updates with sim_gs_n()

When using original_design, sim_gs_n() can compute updated bounds:

# Results include planned and updated bounds
results <- sim_gs_n(
  # ... parameters ...
  original_design = design,
  ia_alpha_spending = "min_planned_actual",  # Conservative
  fa_alpha_spending = "full_alpha"           # Spend full alpha at FA
)

# Output includes:
# - planned_upper_bound, planned_lower_bound
# - updated_upper_bound, updated_lower_bound

Alpha Spending Options:

ia_alpha_spending	Description
"min_planned_actual"	Conservative: min of planned and actual
"actual"	Spend based on actual information

fa_alpha_spending	Description
"full_alpha"	Spend remaining alpha at final
"info_frac"	Spend based on information fraction

Different Tests Across Analyses

# Different tests at each analysis
ia1_test <- create_test(wlr, weight = fh(rho = 0, gamma = 0))
ia2_test <- create_test(wlr, weight = fh(rho = 0, gamma = 0.5))
fa_test <- create_test(wlr, weight = mb(delay = 6, w_max = Inf))

results <- sim_gs_n(
  n_sim = 1000,
  sample_size = 400,
  enroll_rate = enroll_rate,
  fail_rate = fail_rate,
  test = list(ia1 = ia1_test, ia2 = ia2_test, fa = fa_test),
  cut = list(ia1 = ia1_cut, ia2 = ia2_cut, fa = fa_cut)
)

Common GS Patterns

Two-Look Design (1 IA + FA)

# IA at 50% information, FA at 100%
ia_cut <- create_cut(target_event_overall = 150)  # 50%
fa_cut <- create_cut(target_event_overall = 300)  # 100%

sim_gs_n(
  n_sim = 1000,
  sample_size = 400,
  test = wlr,
  cut = list(ia = ia_cut, fa = fa_cut),
  weight = fh(0, 0)
)

Three-Look Design (2 IA + FA)

# Standard 33%, 67%, 100% information
ia1_cut <- create_cut(target_event_overall = 100)
ia2_cut <- create_cut(target_event_overall = 200)
fa_cut <- create_cut(target_event_overall = 300)

Event-Driven with Calendar Constraints

# Events-based but with minimum calendar time
ia_cut <- create_cut(
  target_event_overall = 150,
  planned_calendar_time = 18,  # At least 18 months
  max_extension_for_target_event = 24  # Max 24 months
)

Operating Characteristics

Key Metrics to Evaluate

•Power: P(reject H0 | H1 true)
•Type I Error: P(reject H0 | H0 true)
•Expected Sample Size: E[N] under H0 and H1
•Expected Events: E[events] at each analysis
•Stopping Probabilities: P(stop at analysis k)

Simulation Summary

# Summarize simulation results
results_summary <- results |>
  group_by(analysis) |>
  summarise(
    mean_events = mean(event),
    mean_z = mean(z),
    power = mean(z < qnorm(0.025)),  # One-sided
    .groups = "drop"
  )

Best Practices

•Information Fraction: Target evenly spaced (e.g., 50%, 100% or 33%, 67%, 100%)
•Alpha Spending: OBF is default for most regulatory submissions
•Futility: Use non-binding to preserve flexibility
•Validation: Compare simulated power to gsDesign analytical results
•Documentation: Record all boundary calculations for regulatory submission
•Parallelization: Use plan("multisession") for large simulations

Regulatory Considerations

•Pre-specify number and timing of interim analyses
•Pre-specify spending function and parameters
•Document stopping rules clearly in protocol
•Consider DSMB recommendations for unblinded reviews
•Maintain blinding for operational team