← Back to Analytics

Model Calibration Analysis

Active sport: nfl

This page analyzes the Power Rating Model's performance on historical NFL data (2024–2025 seasons) to ensure its predicted probabilities are well-calibrated.

What is Calibration?

A model is well-calibrated if its predicted probabilities match actual outcomes. For example, if we look at all the times the model predicted a 70% chance of winning, the teams in that group should have actually won about 70% of the time.

Many models are good at ranking but produce poorly-calibrated raw probabilities. Betting with uncalibrated probabilities means you are miscalculating your edge.

This application uses Platt Scaling , a logistic regression model trained on the raw outputs of the main model, to correct for systematic bias and produce well-calibrated probabilities.

Reliability Diagram (2024–2025 Backtest)

This chart plots the model's predicted probability (x-axis) against the actual win frequency (y-axis). A perfectly calibrated model would follow the dashed diagonal line.

Bin 20-30%: predicted 25.7%, actual 48.0%, n=2520-30%Bin 30-40%: predicted 35.4%, actual 37.7%, n=6130-40%Bin 40-50%: predicted 45.4%, actual 51.6%, n=12640-50%Bin 50-60%: predicted 55.3%, actual 49.1%, n=15950-60%Bin 60-70%: predicted 65.1%, actual 67.5%, n=12360-70%Bin 70-80%: predicted 74.7%, actual 59.7%, n=6270-80%Bin 80-90%: predicted 83.9%, actual 81.8%, n=1180-90%Predicted ProbabilityActual Frequency

Further Research: Isotonic Regression

While Platt Scaling is effective, especially when the calibration curve is sigmoidal, another powerful technique is Isotonic Regression.

Unlike Platt Scaling, which assumes a specific logistic function, Isotonic Regression is a non-parametric method. It finds the best-fitting monotonically non-decreasing function to map raw probabilities to calibrated ones. This allows it to fit more complex calibration curves without being constrained to a sigmoid shape.

For PhD-level research, comparing the performance of Platt Scaling against Isotonic Regression on your specific dataset would be a valuable exercise. Scikit-learn provides a robust implementation of both, making such a comparison straightforward to implement.