Model Calibration Analysis
This page analyzes the Power Rating Model's performance on historical NFL data (2024–2025 seasons) to ensure its predicted probabilities are well-calibrated.
What is Calibration?
A model is well-calibrated if its predicted probabilities match actual outcomes. For example, if we look at all the times the model predicted a 70% chance of winning, the teams in that group should have actually won about 70% of the time.
Many models are good at ranking but produce poorly-calibrated raw probabilities. Betting with uncalibrated probabilities means you are miscalculating your edge.
This application uses Platt Scaling , a logistic regression model trained on the raw outputs of the main model, to correct for systematic bias and produce well-calibrated probabilities.
Reliability Diagram (2024–2025 Backtest)
This chart plots the model's predicted probability (x-axis) against the actual win frequency (y-axis). A perfectly calibrated model would follow the dashed diagonal line.
Further Research: Isotonic Regression
While Platt Scaling is effective, especially when the calibration curve is sigmoidal, another powerful technique is Isotonic Regression.
Unlike Platt Scaling, which assumes a specific logistic function, Isotonic Regression is a non-parametric method. It finds the best-fitting monotonically non-decreasing function to map raw probabilities to calibrated ones. This allows it to fit more complex calibration curves without being constrained to a sigmoid shape.
For PhD-level research, comparing the performance of Platt Scaling against Isotonic Regression on your specific dataset would be a valuable exercise. Scikit-learn provides a robust implementation of both, making such a comparison straightforward to implement.