Prediction Quality

Model Calibration

Understand model calibration, reliability curves, probability buckets, and why calibrated probabilities matter for betting and fantasy analytics.

Definition

Model calibration measures whether predicted probabilities match observed frequencies. If a model assigns 60 percent probability to many events, those events should occur about 60 percent of the time in a well-calibrated sample.

Methodology

  1. Collect model predictions and final outcomes for the same event definition.
  2. Group predictions into probability buckets such as 50-55 percent or 60-65 percent.
  3. Compare average predicted probability in each bucket to actual outcome rate.
  4. Review calibration alongside discrimination metrics because a calibrated model can still be unhelpful if it cannot separate strong and weak outcomes.

Example Calibration Table

Illustrative reliability buckets for model win probabilities.

Predicted BucketAvg PredictionObserved RateSample
50-55%52.4%51.8%226
55-60%57.2%58.1%184
60-65%62.1%59.7%137
Example data is illustrative and intended to show structure, not current player or team projections.

Common Uses

  • Check whether model probabilities can be trusted as probabilities.
  • Identify overconfident or underconfident prediction ranges.
  • Improve bet sizing, simulation inputs, and expected value calculations.

Caveats

  • Calibration requires enough outcomes in each probability bucket.
  • Market regime, rule, or data changes can degrade calibration over time.
  • A model can be calibrated overall while miscalibrated for specific teams, markets, or player types.

FAQ

Is calibration the same as accuracy?

No. Accuracy counts correct classifications, while calibration checks whether probabilities match long-run frequencies.

What does overconfidence look like?

Overconfidence occurs when events predicted at a given probability happen less often than predicted, such as 70 percent picks winning 62 percent of the time.

Why does calibration matter for betting?

Expected value calculations depend on probability estimates. Poor calibration can make edges look larger or smaller than they really are.