Model calibration measures whether a model's probability predictions match real-world outcomes. A perfectly calibrated model's 60% predictions win 60% of the time, 70% predictions win 70%, and so on across all probability buckets.
Calibration vs. accuracy: A model can achieve high accuracy (correctly predicts outcome) while being poorly calibrated (assigns wrong probabilities). Calibration is more important for betting because the odds comparison and Kelly sizing both require accurate probabilities.
Brier Score: The primary calibration metric for probability models. It measures the mean squared error between predicted probability and the binary outcome. Lower is better. A random model scores 0.25; a skilled model for sports might score 0.22-0.24.
Reliability diagrams: A visual calibration check — plot predicted probability buckets (0-10%, 10-20%, ...) against actual win rates. Systematic over/under-confidence shows as deviation from the diagonal.
Recalibration methods: If a model is systematically overconfident (predicts 70% but wins 60%), Platt scaling or isotonic regression can recalibrate probabilities post-hoc. The Shark Snip model training UI tracks calibration across seasons.
Practical implication: A model that says 58% but should say 52% will generate Kelly bets that are too large and destroy value. Always test calibration on hold-out data before staking real units.
