Data Analysis

Guidance for exploratory data analysis and statistical testing.

When to Use

•Exploring new datasets
•Selecting appropriate statistical tests
•Performing power analysis
•Reporting results in papers
•Validating experimental results

Exploratory Data Analysis (EDA)

EDA Workflow

•Load and Inspect: Basic data structure
•Summarize: Descriptive statistics
•Visualize: Distributions and relationships
•Identify Issues: Missing data, outliers
•Document Findings: Key insights

Initial Inspection

python

# Basic checks
df.shape          # Dimensions
df.dtypes         # Data types
df.head()         # First rows
df.describe()     # Summary stats
df.isnull().sum() # Missing values

Key Questions to Answer

Question	What to Check
What's the size?	Rows, columns, data types
Any missing data?	Null counts, patterns
What's the distribution?	Histograms, descriptive stats
Any outliers?	Box plots, z-scores
Any relationships?	Correlations, scatter plots
Any patterns?	Trends, clusters, groups

Visualization Guide

Data Type	Visualization
Single continuous	Histogram, density plot, box plot
Single categorical	Bar chart, pie chart
Two continuous	Scatter plot, line plot
Two categorical	Grouped bar chart, heatmap
Continuous + categorical	Box plot by group, violin plot
Time series	Line plot with time axis

Statistical Test Selection

Decision Tree

code

Question: What are you trying to do?
│
├─ Compare groups
│   │
│   ├─ How many groups?
│   │   ├─ 2 groups → See "Two Group Comparisons"
│   │   └─ 3+ groups → See "Multiple Group Comparisons"
│   │
│   └─ Related or independent?
│       ├─ Independent (different subjects)
│       └─ Related (same subjects, before/after)
│
├─ Examine relationships
│   ├─ Two variables → Correlation, regression
│   └─ Multiple variables → Multiple regression
│
└─ Test proportions
    └─ Chi-square test

Two Group Comparisons

Data Type	Independent Groups	Related Groups
Normal	Independent t-test	Paired t-test
Non-normal	Mann-Whitney U	Wilcoxon signed-rank

Multiple Group Comparisons

Data Type	Independent Groups	Related Groups
Normal	One-way ANOVA	Repeated measures ANOVA
Non-normal	Kruskal-Wallis	Friedman test

Checking Assumptions

Normality Tests:

•Shapiro-Wilk (n < 50)
•Kolmogorov-Smirnov (n ≥ 50)
•Visual: Q-Q plot

Homogeneity of Variance:

•Levene's test
•Visual: Box plots by group

Independence:

•By experimental design
•Durbin-Watson (for residuals)

When Assumptions Fail

Violation	Solution
Non-normality	Non-parametric test, transformation
Unequal variance	Welch's t-test, transformation
Non-independence	Mixed-effects model
Outliers	Robust methods, removal (with justification)

Effect Sizes

Why Effect Sizes Matter

•p-values tell you if effect exists, not how big
•Effect sizes quantify the magnitude
•Required for power analysis
•Better for meta-analysis

Common Effect Sizes

Measure	Context	Interpretation
Cohen's d	Two means	0.2=small, 0.5=medium, 0.8=large
Pearson's r	Correlation	0.1=small, 0.3=medium, 0.5=large
Eta-squared	ANOVA	0.01=small, 0.06=medium, 0.14=large
Odds ratio	Categorical	1.5=small, 2.5=medium, 4=large

Computing Effect Sizes

python

# Cohen's d for two groups
import numpy as np

def cohens_d(group1, group2):
    n1, n2 = len(group1), len(group2)
    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
    pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))
    return (np.mean(group1) - np.mean(group2)) / pooled_std

Power Analysis

Key Concepts

Term	Definition
Power	Probability of detecting true effect (1-β)
α (alpha)	False positive rate (typically 0.05)
β (beta)	False negative rate (typically 0.20)
Effect size	Magnitude of effect
Sample size	Number of observations

Power Analysis Uses

•A priori: Before study, determine needed sample size
•Post hoc: After study, calculate achieved power
•Sensitivity: Given n and power, what effect detectable?

Sample Size Calculation

python

from statsmodels.stats.power import TTestIndPower

analysis = TTestIndPower()

# Sample size for t-test
n = analysis.solve_power(
    effect_size=0.5,    # Cohen's d
    alpha=0.05,         # Significance level
    power=0.80,         # Desired power
    ratio=1.0,          # n2/n1
    alternative='two-sided'
)

Power Guidelines

Power	Interpretation
< 0.50	Inadequate
0.50-0.70	Low
0.70-0.80	Moderate
≥ 0.80	Adequate (standard target)
≥ 0.90	High

Multiple Comparisons

The Problem

•Each test has α chance of false positive
•Multiple tests inflate false positive rate
•Family-wise error rate: 1-(1-α)^n

Correction Methods

Method	When to Use	Strictness
Bonferroni	Few comparisons	Most conservative
Holm	Few comparisons	Less conservative
Benjamini-Hochberg	Many comparisons	Controls FDR
Tukey HSD	Post-hoc ANOVA	Common choice

Applying Corrections

python

from scipy import stats
import numpy as np

# Bonferroni
adjusted_alpha = 0.05 / num_tests

# Holm-Bonferroni
from statsmodels.stats.multitest import multipletests
reject, pvals_corrected, _, _ = multipletests(pvals, method='holm')

# Benjamini-Hochberg (FDR)
reject, pvals_corrected, _, _ = multipletests(pvals, method='fdr_bh')

Reporting Results

APA Format

t-test:

code

t(df) = X.XX, p = .XXX, d = X.XX

Example: t(45) = 2.34, p = .023, d = 0.68

ANOVA:

code

F(df1, df2) = X.XX, p = .XXX, η² = .XX

Example: F(2, 87) = 4.56, p = .013, η² = .095

Correlation:

code

r(df) = .XX, p = .XXX

Example: r(48) = .42, p = .003

Chi-square:

code

χ²(df, N = n) = X.XX, p = .XXX

Example: χ²(2, N = 150) = 6.78, p = .034

Reporting Guidelines

Do:

•Report exact p-values (not just < .05)
•Include effect sizes
•Report confidence intervals
•Describe what was tested

Don't:

•Say "proved" or "significant difference" alone
•Report only significant results
•Cherry-pick tests
•Over-interpret p = .049 vs p = .051

Results Table Format

code

Table 1: Comparison of Methods on Benchmark X

Method      Mean (SD)       95% CI          Effect Size
------------------------------------------------------
Baseline    75.2 (3.4)     [73.1, 77.3]    -
Method A    78.9 (2.8)*    [77.2, 80.6]    d = 0.52
Method B    81.4 (3.1)**   [79.5, 83.3]    d = 0.89

Note: * p < .05, ** p < .01 vs Baseline

Common Pitfalls

Pitfall	Problem	Solution
P-hacking	Running tests until significant	Pre-register analysis
HARKing	Hypothesizing after results known	State hypotheses before
Multiple comparisons	Inflated false positives	Apply correction
Pseudo-replication	Non-independent samples	Mixed-effects models
Ignoring effect sizes	Significant ≠ important	Report effect sizes

Quality Checklist

• Research questions clearly stated
• Appropriate test selected and justified
• Assumptions checked
• Sample size justified (power analysis)
• Multiple comparisons corrected
• Effect sizes reported
• Confidence intervals included
• Results correctly interpreted
• Limitations acknowledged

References

See references/ folder for:

•test_selection.md: Detailed test selection guide
•apa_reporting.md: Complete APA reporting templates