AgentSkillsCN

elo-ratings-math

解释 Elo 评分系统的数学原理,包括预期分数计算、评分更新和 K 因子。在实施或理解竞争评分系统时使用。

SKILL.md
--- frontmatter
name: elo-ratings-math
description: Explains the mathematical principles behind Elo rating systems, including expected score calculation, rating updates, and the K-factor. Use when implementing or understanding competitive rating systems.

Elo Ratings Mathematics

Overview

The Elo rating system is a method for calculating the relative skill levels of players in competitor-versus-competitor games. Originally developed by Arpad Elo for chess, it's now used in many competitive contexts including sports, video games, and online platforms.

Core Mathematical Principles

1. Expected Score Formula

The expected score for a player is the probability of winning based on the rating difference between two players.

Formula:

code
E_A = 1 / (1 + 10^((R_B - R_A) / 400))

Where:

  • E_A = Expected score for player A (between 0 and 1)
  • R_A = Current rating of player A
  • R_B = Current rating of player B
  • 10^x = 10 raised to the power of x

Interpretation:

  • E_A = 1.0 means player A is expected to win with certainty
  • E_A = 0.5 means both players are equally matched (50% win probability)
  • E_A = 0.0 means player A is expected to lose with certainty

Example: If player A has rating 1600 and player B has rating 1400:

code
E_A = 1 / (1 + 10^((1400 - 1600) / 400))
E_A = 1 / (1 + 10^(-200 / 400))
E_A = 1 / (1 + 10^(-0.5))
E_A = 1 / (1 + 0.316)
E_A ≈ 0.76

Player A is expected to score 0.76 (76% chance of winning).

2. Rating Update Formula

After a game, ratings are updated based on the actual outcome compared to the expected outcome.

Formula:

code
R'_A = R_A + K × (S_A - E_A)

Where:

  • R'_A = New rating for player A
  • R_A = Old rating for player A
  • K = K-factor (determines rating volatility)
  • S_A = Actual score (1 for win, 0.5 for draw, 0 for loss)
  • E_A = Expected score (from formula above)

The Update Difference:

code
ΔR_A = K × (S_A - E_A)

This difference represents:

  • Positive value: Player performed better than expected (rating increases)
  • Negative value: Player performed worse than expected (rating decreases)
  • Zero: Player performed exactly as expected (no rating change)

3. The K-Factor

The K-factor controls how much ratings can change after each game.

Common K-factor values:

  • K = 32: High volatility, used for beginners or provisional ratings
  • K = 24: Medium volatility, used for intermediate players
  • K = 16: Low volatility, used for established/expert players
  • K = 10: Very stable, used for top-level players

Adaptive K-factor example (FIDE chess system):

code
K = 40  if games_played < 30
K = 20  if rating < 2400
K = 10  if rating >= 2400

4. Rating Difference and Win Probability

The relationship between rating difference and expected win probability:

Rating DifferenceExpected ScoreWin Probability
00.5050%
500.5757%
1000.6464%
2000.7676%
3000.8585%
4000.9191%
5000.9595%
6000.9797%

Formula for any rating difference:

code
Win_Probability = 1 / (1 + 10^(-ΔR / 400))

Where ΔR = R_A - R_B

5. Two-Player Zero-Sum Property

In a two-player game, the rating changes are equal and opposite:

code
ΔR_A = -ΔR_B

This is because:

code
E_A + E_B = 1
S_A + S_B = 1 (for decisive games)

Therefore:

code
ΔR_A = K × (S_A - E_A)
ΔR_B = K × (S_B - E_B) = K × ((1 - S_A) - (1 - E_A)) = -K × (S_A - E_A) = -ΔR_A

Comprehensive Example

Scenario: Player A (rating 1800) plays Player B (rating 1700), K = 32

Step 1: Calculate Expected Scores

code
E_A = 1 / (1 + 10^((1700 - 1800) / 400))
E_A = 1 / (1 + 10^(-0.25))
E_A = 1 / (1 + 0.562)
E_A ≈ 0.64

E_B = 1 - E_A ≈ 0.36

Step 2: Actual Outcome - Player B Wins (upset!)

code
S_A = 0 (loss)
S_B = 1 (win)

Step 3: Calculate Rating Changes

code
ΔR_A = 32 × (0 - 0.64) = 32 × (-0.64) = -20.48 ≈ -20
ΔR_B = 32 × (1 - 0.36) = 32 × (0.64) = 20.48 ≈ +20

Step 4: New Ratings

code
R'_A = 1800 + (-20) = 1780
R'_B = 1700 + 20 = 1720

Player B gained 20 points for the upset victory, while player A lost 20 points.

Multi-Player Extensions

For games with more than two players, the Elo system can be extended:

Pairwise Comparison Method: Each player's rating change is the sum of their changes against all opponents:

code
ΔR_i = K × Σ(S_ij - E_ij)

Where:

  • i = player being rated
  • j = each opponent
  • S_ij = actual score against opponent j
  • E_ij = expected score against opponent j

Mathematical Properties

1. Conservation of Rating Points: In a closed system with only two-player games, the total rating points remain constant.

2. Logistic Distribution: The expected score formula uses a logistic curve, which creates smooth probability transitions.

3. Rating Scale Calibration: The choice of 400 in the formula means a 400-point difference corresponds to a 10:1 odds ratio (91% vs 9% win probability).

4. Convergence: Over many games, ratings converge toward players' true skill levels, with convergence speed determined by K-factor.

Implementation Considerations

When implementing Elo ratings:

  1. Initial Ratings: Typically start players at 1200, 1500, or 1600
  2. Minimum Ratings: Consider setting a floor (e.g., 100) to prevent negative ratings
  3. Rating Inflation/Deflation: Monitor average ratings over time
  4. Provisional Periods: Use higher K-factors for new players
  5. Inactivity Decay: Consider rating decay for inactive players
  6. Draw Handling: Use S = 0.5 for both players in draws

Extensions and Variants

Glicko and Glicko-2: Adds rating deviation (RD) to account for uncertainty:

code
RD² = rating variance (higher = more uncertain)

TrueSkill: Microsoft's system using Bayesian inference with skill mean (μ) and skill standard deviation (σ).

Elo with Home Advantage: Add a constant to the home player's rating in expected score calculation:

code
E_home = 1 / (1 + 10^((R_away - (R_home + H)) / 400))

Where H is the home advantage (typically 30-100 points).

References

  • Elo, A. E. (1978). The Rating of Chessplayers, Past and Present
  • FIDE Handbook: Rating Regulations
  • Glickman, M. E. (1999). "Parameter estimation in large dynamic paired comparison experiments"