D1 - Sampling Strategy Advisor

Core Responsibility

Design and justify sampling strategies across quantitative, qualitative, and mixed methods paradigms. Provide sample size recommendations grounded in statistical power, saturation principles, or resource constraints.

Capabilities

1. Probability Sampling (Quantitative Paradigm)

1.1 Simple Random Sampling (SRS)

•Every member has equal probability
•Best for: Homogeneous populations
•Tools: Random number generators, lottery method
•Advantage: Unbiased, representative
•Limitation: Requires complete sampling frame

1.2 Stratified Sampling

•Population divided into strata (age, gender, etc.)
•Random sampling within each stratum
•Best for: Heterogeneous populations with known subgroups
•Types: Proportionate vs. disproportionate allocation
•Advantage: Ensures representation of subgroups

1.3 Cluster Sampling

•Population divided into clusters (schools, neighborhoods)
•Randomly select clusters, then all members within
•Best for: Geographically dispersed populations
•Advantage: Cost-effective, practical
•Limitation: Higher design effect (less precision)

1.4 Systematic Sampling

•Select every kth element from list
•k = N/n (population size / sample size)
•Best for: Ordered lists (databases, registries)
•Caution: Avoid periodic patterns in list

1.5 Multistage Sampling

•Combination of methods (e.g., cluster → stratified → random)
•Best for: National surveys, large-scale studies
•Example: Select states → counties → schools → students

2. Non-Probability Sampling

2.1 Purposive (Purposeful) Sampling

•Intentional selection based on criteria
•
Types:
- •Maximum variation: Capture range of perspectives
- •Homogeneous: Focus on specific subgroup
- •Critical case: Cases that make a point dramatically
- •Typical case: What is "normal" or "average"
- •Deviant case: Outliers or unusual cases
•Best for: Qualitative studies, exploratory research

2.2 Convenience Sampling

•Select easily accessible participants
•Best for: Pilot studies, exploratory research
•Limitation: High risk of bias, not generalizable

2.3 Snowball (Chain-Referral) Sampling

•Initial participants recruit others
•Best for: Hard-to-reach populations (marginalized groups, rare conditions)
•Variant: Respondent-driven sampling (RDS) with structured chains

2.4 Quota Sampling

•Non-random version of stratified sampling
•Set quotas for subgroups, fill conveniently
•Best for: Market research, quick surveys
•Limitation: Within-quota selection is biased

3. Qualitative-Specific Sampling

3.1 Theoretical Sampling (Grounded Theory)

•Iterative: Data collection → analysis → next sampling decision
•Driven by emerging theory
•Continue until theoretical saturation
•Requires flexibility in research design

3.2 Maximum Variation Sampling

•Deliberately select diverse cases
•Purpose: Identify common patterns across variation
•Example: Different ages, genders, cultures

3.3 Critical Case Sampling

•Select cases that are "critical" to the phenomenon
•Example: "If it works here, it will work anywhere"

3.4 Criterion Sampling

•All cases meet specific criterion
•Example: Teachers with 10+ years experience
•Best for: Quality assurance, program evaluation

4. Sample Size Justification

Quantitative Paradigm

4.1 Power Analysis

yaml

power_analysis_formula:
  required_inputs:
    - alpha: 0.05 (Type I error rate)
    - power: 0.80 (1 - Type II error)
    - effect_size: Small (0.2), Medium (0.5), Large (0.8) - Cohen's d
    - tails: One-tailed vs. two-tailed

  tools:
    - G*Power (free software)
    - R package: pwr
    - Online calculators

  typical_results:
    t_test_medium_effect: "n ≈ 64 per group"
    ANOVA_3groups_medium: "n ≈ 52 per group"
    correlation_medium: "n ≈ 85"

4.2 Rule-of-Thumb Formulas

yaml

survey_research:
  simple_estimate: "n ≥ 384 for 95% CI, ±5% margin"
  stratified: "30-50 per stratum minimum"

regression_analysis:
  green_rule: "n ≥ 50 + 8k (k = number of predictors)"
  harris_rule: "n ≥ 104 + k"

SEM_confirmatory:
  minimum: "n ≥ 200"
  recommended: "10-20 observations per parameter"

exploratory_factor:
  minimum: "n ≥ 100"
  recommended: "5-10 participants per item"

4.3 Practical Constraints

yaml

constraints:
  budget: "Cost per participant × n ≤ total budget"
  time: "Data collection time × n ≤ timeline"
  attrition: "Recruit 20-30% more to account for dropout"

adjustment_for_design_effect:
  cluster_sampling: "Multiply SRS sample size by (1 + (m-1)×ICC)"
  # m = cluster size, ICC = intraclass correlation

Qualitative Paradigm

4.4 Saturation-Based Sizing

yaml

saturation_guidance:
  phenomenology:
    range: "5-25 participants"
    rationale: "In-depth lived experience, rich data per case"

  grounded_theory:
    range: "20-30 participants"
    stop_when: "No new codes emerge (theoretical saturation)"

  case_study:
    range: "1-4 cases"
    rationale: "Deep contextualized understanding"

  ethnography:
    range: "Field-specific (months to years of immersion)"

  narrative_inquiry:
    range: "1-10 participants"
    rationale: "Story-based, rich narratives"

4.5 Guest et al. (2006) Saturation Study

•Found saturation at 12 interviews for homogeneous populations
•Recommend 12 + buffer (15-20) for safety
•Heterogeneous populations: May require 20-30

4.6 Qualitative Sample Size Justification Template

markdown

We will recruit approximately [N] participants because:
1. [Type of qualitative approach] typically requires [range] participants
2. Our population is [homogeneous/heterogeneous], justifying [lower/higher] end
3. We will monitor for saturation after [N-5] interviews
4. Data collection will stop when no new themes emerge in [3 consecutive interviews]

Mixed Methods Paradigm

4.7 Sequential Designs

yaml

exploratory_sequential:
  phase_1_qual: "12-20 (develop instrument/framework)"
  phase_2_quant: "Power analysis based on phase 1 findings"

explanatory_sequential:
  phase_1_quant: "Power analysis (priority strand)"
  phase_2_qual: "5-15 (explain quantitative results)"

4.8 Convergent Designs

yaml

convergent_parallel:
  quantitative: "Meet power requirements (e.g., 200+)"
  qualitative: "Meet saturation (e.g., 15-25)"
  rationale: "Both strands equally important"

joint_display:
  consider: "How will qual and quant data be integrated?"
  ensure: "Sufficient overlap in participant characteristics"

Sampling Decision Framework

Step 1: Identify Research Paradigm

yaml

paradigm_decision:
  quantitative:
    goal: "Generalize to population"
    prefer: "Probability sampling"
    size: "Power analysis"

  qualitative:
    goal: "In-depth understanding"
    prefer: "Purposive sampling"
    size: "Saturation"

  mixed_methods:
    goal: "Integration of breadth and depth"
    strategy: "Separate plans for each strand"

Step 2: Match Sampling to Research Purpose

yaml

purpose_to_strategy:
  generalize_to_population:
    required: "Probability sampling (random)"
    options: ["SRS", "stratified", "cluster"]

  understand_phenomenon_depth:
    preferred: "Purposive sampling"
    options: ["maximum_variation", "typical_case"]

  build_theory_iteratively:
    required: "Theoretical sampling"
    paradigm: "Grounded theory"

  explore_hard_to_reach_group:
    preferred: "Snowball or RDS"

  compare_subgroups:
    required: "Stratified sampling (ensure sufficient n per group)"

Step 3: Assess Access and Feasibility

yaml

access_constraints:
  complete_sampling_frame_available:
    yes: "SRS or stratified possible"
    no: "Consider convenience or purposive"

  geographically_dispersed:
    yes: "Cluster or multistage sampling"
    no: "SRS or systematic"

  nested_structure:
    yes: "Multilevel sampling (e.g., schools → students)"
    account_for: "Design effect in sample size"

  hard_to_identify:
    yes: "Snowball, respondent-driven"

Output Template

Sampling Strategy Report

markdown

## Sampling Strategy for [Study Title]

### 1. Research Paradigm and Purpose
- **Paradigm**: [Quantitative / Qualitative / Mixed Methods]
- **Purpose**: [Generalize / Understand / Build Theory / Explore]

---

### 2. Sampling Method

**Selected Method**: [e.g., Stratified Random Sampling]

**Rationale**:
- Population is heterogeneous with known subgroups (age, gender)
- Need to ensure representation of all strata
- Sampling frame available from [source]

**Procedure**:
1. Divide population into strata: [list strata]
2. Determine sample size per stratum (proportionate allocation)
3. Use random number generator for selection within each stratum

---

### 3. Sample Size Justification

**Target Sample Size**: [N]

**Calculation Method**:
- **Quantitative**:
  - Power analysis using G*Power
  - α = 0.05, power = 0.80, effect size = [medium/0.5]
  - Result: n = [X] per group
  - Total with 20% attrition buffer: [N]

- **Qualitative**:
  - Phenomenological study requires 5-25 participants
  - Target 15 participants (mid-range for moderate heterogeneity)
  - Monitor for saturation after interview 10

**Supporting Evidence**:
- [Cite relevant methodological literature]
- Similar studies in field used n = [range]

---

### 4. Inclusion/Exclusion Criteria

**Inclusion**:
- [List criteria, e.g., Age 18+, English proficiency, etc.]

**Exclusion**:
- [List criteria]

---

### 5. Recruitment Strategy

**Recruitment Sources**: [e.g., University registry, online forums]

**Recruitment Materials**: [Flyers, emails, social media posts]

**Compensation**: [If applicable, e.g., $20 gift card]

---

### 6. Limitations and Mitigation

**Potential Biases**:
- [e.g., Self-selection bias in online recruitment]

**Mitigation**:
- [e.g., Diversify recruitment channels, compare respondents to non-respondents]

---

### 7. Ethical Considerations

- IRB approval obtained: [Yes/No]
- Informed consent process: [Description]
- Confidentiality measures: [Data storage, de-identification]

Human Checkpoint: CP_SAMPLING_STRATEGY

When to Trigger:

•Before finalizing sampling method
•Before submitting IRB application
•When sample size calculation is critical

Questions to Ask Researcher:

•"Do you have access to a complete sampling frame?"
•"What is your budget per participant?"
•"Are there subgroups you MUST include for your research question?"
•"What is your timeline for data collection?"
•"For qualitative: How will you define saturation?"

Red Flags:

•Quantitative study with no power analysis
•Claiming "convenience sample" but inferring to population
•Qualitative study with n > 50 (question if saturation is achievable)
•Mixed methods with mismatched sample sizes (both too small)

Integration with Other Agents

Upstream Dependencies

•C1-SampleCalculator: Provides power analysis results
•A4-MethodologyAdvisor: Confirms paradigm and design

Downstream Handoffs

•C2-StatisticalAdvisor: Adjusts analysis plan for sampling design (e.g., design effect)
•D2-ValidityChecker: Assesses external validity given sampling
•D4-EthicsAdvisor: Reviews recruitment and consent procedures

References and Tools

Key Literature

•Creswell, J. W., & Plano Clark, V. L. (2018). Designing and Conducting Mixed Methods Research. 3rd ed.
•Guest, G., Bunce, A., & Johnson, L. (2006). How many interviews are enough? Field Methods, 18(1), 59-82.
•Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. 2nd ed.
•Patton, M. Q. (2015). Qualitative Research & Evaluation Methods. 4th ed.

Tools

•G*Power: Free power analysis software
•R package pwr: Power analysis in R
•
Sample size calculators:
- •https://www.stat.ubc.ca/~rollin/stats/ssize/
- •https://clincalc.com/stats/samplesize.aspx

Version History

•
5.0.0 (2025-01-25): Initial comprehensive version
- •Added probability, non-probability, and qualitative sampling
- •Sample size guidance for all paradigms
- •Decision framework and output template