Grey Relational Analysis (GRA)

Analyze correlation between factors and outcomes using grey relational grade, especially for small samples.

When to Use

•Small sample size: 4-20 data points (unlike correlation which needs >30)
•Factor analysis: Which factors most influence the outcome?
•Non-linear relationships: No assumption of linearity
•Multi-factor ranking: Rank importance of multiple influencing factors
•Incomplete information: "Grey" = between black (unknown) and white (known)

Key difference from correlation coefficient:

•No requirement for large samples
•No assumption of linear relationship
•No requirement for normal distribution

This is NOT forecasting: Use grey-forecaster for prediction, use grey-relation for correlation analysis.

Quick Start

Core Concepts

•
Reference sequence (母序列) $X_0$: The outcome/target variable
- •Example: Student test scores
•
Comparison sequences (子序列) $X_i$: The influencing factors
- •Example: Study hours, sleep hours, exercise hours
•
Grey relational coefficient $\xi_i(k)$: Similarity at each data point $$\xi_i(k) = \frac{\min_i \min_k |x_0(k) - x_i(k)| + \rho \max_i \max_k |x_0(k) - x_i(k)|}{|x_0(k) - x_i(k)| + \rho \max_i \max_k |x_0(k) - x_i(k)|}$$ where $\rho = 0.5$ (resolution coefficient, usually 0.5)
•
Grey relational grade $\gamma_i$: Overall correlation (average of coefficients) $$\gamma_i = \frac{1}{n} \sum_{k=1}^n \xi_i(k)$$
•
Ranking: Higher $\gamma_i$ means stronger correlation

Python Implementation

python

import numpy as np

# Example: Which factor most affects test scores?
# Columns: [Test Score, Study Hours, Sleep Hours, Exercise Hours]
data = np.array([
    [55, 24, 10, 5],
    [65, 38, 22, 8],
    [75, 40, 18, 6],
    [100, 50, 20, 10]
])

# Step 1: Normalization (dimensionless)
mean = np.mean(data, axis=0)
data_norm = data / mean

# Step 2: Reference sequence (outcome)
Y = data_norm[:, 0]  # Test scores

# Step 3: Comparison sequences (factors)
X = data_norm[:, 1:]  # Study, Sleep, Exercise

# Step 4: Calculate absolute differences |X0 - Xi|
abs_diff = np.abs(X - Y.reshape(-1, 1))

# Step 5: Find min and max differences
a = np.min(abs_diff)  # Two-level minimum
b = np.max(abs_diff)  # Two-level maximum

# Step 6: Resolution coefficient
rho = 0.5

# Step 7: Grey relational coefficient
gamma_matrix = (a + rho * b) / (abs_diff + rho * b)

# Step 8: Grey relational grade (average)
gamma = np.mean(gamma_matrix, axis=0)

# Step 9: Rank factors
factors = ['Study Hours', 'Sleep Hours', 'Exercise Hours']
ranking = sorted(zip(factors, gamma), key=lambda x: x[1], reverse=True)

print("Grey Relational Grade:")
for factor, grade in ranking:
    print(f"{factor}: {grade:.4f}")

print(f"\nMost important factor: {ranking[0][0]}")

MATLAB Implementation

matlab

% Data: [Test Score, Study Hours, Sleep Hours, Exercise Hours]
A = [55, 24, 10, 5;
     65, 38, 22, 8;
     75, 40, 18, 6;
     100, 50, 20, 10];

% Step 1: Normalization
Mean = mean(A, 1);
A_norm = A ./ Mean;

% Step 2: Reference sequence
Y = A_norm(:, 1);

% Step 3: Comparison sequences
X = A_norm(:, 2:end);

% Step 4: Absolute differences
absX0_Xi = abs(X - repmat(Y, 1, size(X, 2)));

% Step 5: Min and max
a = min(absX0_Xi(:));
b = max(absX0_Xi(:));

% Step 6: Resolution coefficient
rho = 0.5;

% Step 7: Grey relational coefficient
gamma_matrix = (a + rho * b) ./ (absX0_Xi + rho * b);

% Step 8: Grey relational grade
gamma = mean(gamma_matrix, 1);

% Step 9: Display results
factors = {'Study Hours', 'Sleep Hours', 'Exercise Hours'};
fprintf('Grey Relational Grade:\n');
for i = 1:length(factors)
    fprintf('%s: %.4f\n', factors{i}, gamma(i));
end

[~, idx] = max(gamma);
fprintf('\nMost important factor: %s\n', factors{idx});

Standard Procedure

Step 1: Data Preparation

Organize data into matrix:

•Rows: Samples (observations)
•Columns: [Outcome, Factor1, Factor2, ..., FactorN]

Step 2: Normalization (Dimensionless)

Three common methods:

1. Mean normalization (most common): $$x_i'(k) = \frac{x_i(k)}{\bar{x_i}}$$

2. Min-max normalization: $$x_i'(k) = \frac{x_i(k) - \min x_i}{\max x_i - \min x_i}$$

3. Initial value normalization: $$x_i'(k) = \frac{x_i(k)}{x_i(1)}$$

Step 3: Separate Reference and Comparison

•Reference sequence $X_0$: The outcome variable
•Comparison sequences $X_1, X_2, ..., X_m$: The factors

Step 4: Calculate Differences

$$\Delta_i(k) = |x_0(k) - x_i(k)|$$

Step 5: Find Two-Level Extrema

$$a = \min_i \min_k \Delta_i(k)$$ $$b = \max_i \max_k \Delta_i(k)$$

Step 6: Calculate Grey Relational Coefficient

$$\xi_i(k) = \frac{a + \rho b}{\Delta_i(k) + \rho b}$$

where $\rho \in [0, 1]$ is resolution coefficient (usually 0.5)

Step 7: Calculate Grey Relational Grade

$$\gamma_i = \frac{1}{n} \sum_{k=1}^n \xi_i(k)$$

Step 8: Rank Factors

Sort factors by $\gamma_i$ in descending order.

Examples in Skill

See examples/ folder:

•code1.py / code1.m: Basic grey relational analysis with normalization
•code2.py / code2.m: Alternative implementation
•Inter2Max.m, Mid2Max.m, Min2Max.m: Data preprocessing functions (convert to "larger is better")
•Positivization.m: Positive transformation for indicators

Competition Tips

Problem Recognition (Day 1)

Grey relational analysis indicators:

•"Which factors most influence..."
•"Rank importance of factors"
•Small sample size (< 30 observations)
•No clear linear relationship
•Multiple factors, need to find key drivers

Use GRA when:

•Sample size is small (4-20 points)
•Don't know if relationship is linear
•Want to rank factor importance
•Data doesn't meet assumptions for regression/correlation

Formulation Steps (Day 1-2)

•Define outcome variable: What are you trying to explain?
•Identify factors: What might influence the outcome?
•Collect data: Organize into matrix
•Normalize: Make data dimensionless
•Compute GRA: Follow 8-step procedure
•Interpret: Rank factors by grey relational grade

Implementation Template

python

import numpy as np

# 1. Prepare data (rows=samples, cols=[outcome, factor1, factor2, ...])
data = np.array([
    [y1, x1_1, x1_2, ...],
    [y2, x2_1, x2_2, ...],
    # ...
])

# 2. Normalize
mean = np.mean(data, axis=0)
data_norm = data / mean

# 3. Separate reference and comparison
Y = data_norm[:, 0]
X = data_norm[:, 1:]

# 4. Calculate differences
abs_diff = np.abs(X - Y.reshape(-1, 1))

# 5. Two-level extrema
a = np.min(abs_diff)
b = np.max(abs_diff)

# 6. Grey relational coefficient
rho = 0.5
gamma_matrix = (a + rho * b) / (abs_diff + rho * b)

# 7. Grey relational grade
gamma = np.mean(gamma_matrix, axis=0)

# 8. Rank and display
factors = ['Factor1', 'Factor2', 'Factor3']
for factor, grade in sorted(zip(factors, gamma), 
                            key=lambda x: x[1], reverse=True):
    print(f"{factor}: {grade:.4f}")

Validation (Day 3)

•Check normalization: All sequences should be dimensionless
•Verify ranking: Does it match intuition/domain knowledge?
•Sensitivity to $\rho$: Try $\rho = 0.3, 0.5, 0.7$ - ranking should be stable
•Compare with correlation: For validation, compute Pearson correlation

Common Pitfalls

•
Confusing with grey forecasting: GRA is for correlation, not prediction
- •Fix: Use grey-forecaster for time series prediction
•
Too large sample: If n > 50, use standard correlation/regression
- •GRA advantage is for small samples
•
Forgot normalization: Different units make comparison meaningless
- •Fix: Always normalize first
•
Wrong reference sequence: Used factor as reference instead of outcome
- •Fix: Reference = outcome variable, comparison = factors
•
Over-interpreting: High $\gamma$ doesn't mean causation
- •GRA shows correlation, not causation

Advanced: Data Preprocessing

Indicator Direction Transformation

Convert all indicators to "larger is better":

1. Cost-type (smaller is better) → Benefit-type: $$x_i'(k) = \max x_i - x_i(k)$$

2. Interval-type (optimal range [a, b]): $$x_i'(k) = \begin{cases} 1 - \frac{a - x_i(k)}{M}, & x_i(k) < a \ 1, & a \leq x_i(k) \leq b \ 1 - \frac{x_i(k) - b}{M}, & x_i(k) > b \end{cases}$$

where $M = \max{a - \min x_i, \max x_i - b}$

Comparison with Other Methods

Method	Sample Size	Assumptions	When to Use
Grey Relational Analysis	Small (4-20)	None	Small sample, non-linear
Pearson Correlation	Large (>30)	Linear, normal	Large sample, linear
Spearman Correlation	Medium (>10)	Monotonic	Non-linear but monotonic
Regression	Large (>30)	Linear, normal	Prediction, causation

References

•references/5-灰色关联分析【微信公众号丨B站：数模加油站】.pdf: Complete tutorial
•Deng, J. (1982) "Control problems of grey systems"

Time Budget

•Data preparation: 15-30 min
•Normalization: 10-15 min
•Implementation: 15-25 min
•Validation: 20-30 min
•Total: 1-1.5 hours

Related Skills

•grey-forecaster: Grey prediction (GM(1,1)) for time series
•pca-analyzer: Alternative for factor analysis (large samples)
•entropy-weight-method: Objective weighting based on data variation
•fuzzy-evaluation: For qualitative factor evaluation