Definition
Model calibration measures whether predicted probabilities match observed frequencies. If a model assigns 60 percent probability to many events, those events should occur about 60 percent of the time in a well-calibrated sample.
Methodology
- Collect model predictions and final outcomes for the same event definition.
- Group predictions into probability buckets such as 50-55 percent or 60-65 percent.
- Compare average predicted probability in each bucket to actual outcome rate.
- Review calibration alongside discrimination metrics because a calibrated model can still be unhelpful if it cannot separate strong and weak outcomes.
Example Calibration Table
Illustrative reliability buckets for model win probabilities.
| Predicted Bucket | Avg Prediction | Observed Rate | Sample |
|---|---|---|---|
| 50-55% | 52.4% | 51.8% | 226 |
| 55-60% | 57.2% | 58.1% | 184 |
| 60-65% | 62.1% | 59.7% | 137 |
Common Uses
- Check whether model probabilities can be trusted as probabilities.
- Identify overconfident or underconfident prediction ranges.
- Improve bet sizing, simulation inputs, and expected value calculations.
Caveats
- Calibration requires enough outcomes in each probability bucket.
- Market regime, rule, or data changes can degrade calibration over time.
- A model can be calibrated overall while miscalibrated for specific teams, markets, or player types.
FAQ
Is calibration the same as accuracy?
No. Accuracy counts correct classifications, while calibration checks whether probabilities match long-run frequencies.
What does overconfidence look like?
Overconfidence occurs when events predicted at a given probability happen less often than predicted, such as 70 percent picks winning 62 percent of the time.
Why does calibration matter for betting?
Expected value calculations depend on probability estimates. Poor calibration can make edges look larger or smaller than they really are.