Model Calibration Analysis

Active sport: nfl

This page analyzes the Power Rating Model's performance on historical NFL data (2024–2025 seasons) to ensure its predicted probabilities are well-calibrated.

What is Calibration?

A model is well-calibrated if its predicted probabilities match actual outcomes. For example, if we look at all the times the model predicted a 70% chance of winning, the teams in that group should have actually won about 70% of the time.

Many models are good at ranking but produce poorly-calibrated raw probabilities. Betting with uncalibrated probabilities means you are miscalculating your edge.

This application uses Platt Scaling , a logistic regression model trained on the raw outputs of the main model, to correct for systematic bias and produce well-calibrated probabilities.

Reliability Diagram (2024–2025 Backtest)

This chart plots the model's predicted probability (x-axis) against the actual win frequency (y-axis). A perfectly calibrated model would follow the dashed diagonal line.

Further Research: Isotonic Regression

While Platt Scaling is effective, especially when the calibration curve is sigmoidal, another powerful technique is Isotonic Regression.

Unlike Platt Scaling, which assumes a specific logistic function, Isotonic Regression is a non-parametric method. It finds the best-fitting monotonically non-decreasing function to map raw probabilities to calibrated ones. This allows it to fit more complex calibration curves without being constrained to a sigmoid shape.

For PhD-level research, comparing the performance of Platt Scaling against Isotonic Regression on your specific dataset would be a valuable exercise. Scikit-learn provides a robust implementation of both, making such a comparison straightforward to implement.