The phrase "NFL model" gets thrown around loosely. To a tout it means "today's pick." To a system bettor it means "any team off a loss covers." To an analyst it means a numerical projection with a calibrated probability attached, evaluated against closing lines on out-of-sample data, sized with a fractional Kelly stake. This guide walks the analyst path end to end — spread, total, and an nfl player prop model — using only public data and a browser. You can run every step in the Shark Snip Workshop while reading.
What a model actually is (and what it isn't)
A model is a function that maps inputs to a probability or expected value. It is not a pick, a system, or a vibe. The distinction matters because each of those three competing things gets called a "model" in betting Twitter, and they fail in different ways.
- A pick is a single output without a probability. "Take the Eagles -3" tells you nothing about confidence, sample, or edge. Picks are unfalsifiable in any rigorous sense.
- A system is a hard rule applied to every game that fits the criteria. "Home dogs after a road loss" is a system. Systems are testable but rigid — they ignore context that a continuous model captures naturally.
- A model outputs a number — predicted margin, predicted total, predicted prop, or a probability. It can be evaluated on Brier score, log-loss, and realized return. It updates when new data arrives. It is wrong in measurable ways and improvable through measurable changes.
If you compare your output to the line and decide whether to bet, you have something model-like. If you also compute a probability, evaluate calibration, and size with Kelly, you have a model. That is the bar the rest of this article assumes. For market mechanics that the model has to beat, the spread mechanics primer covers the half-point key numbers, and the sharp vs public guide covers how lines move.
Building a baseline NFL spread model
Spread models predict either game margin or cover probability against the posted line. The cleanest target is margin — home final score minus away final score — because everything downstream (cover probability, expected value, Kelly stake) is a deterministic function of that one number plus a noise estimate.
Step 1: Set up the target and the split
Pull NFL play-by-play from the public nflverse data (2018-2024 gives you about 1,800 regular-season games, enough to train without overfitting). Aggregate to one row per game with home and away final scores, the closing spread, and the closing total. Set margin = home_score - away_score as your target.
Split walk-forward, never randomly. A typical split is 2018-2022 train, 2023 validate, 2024 test. Random splits leak future information into the training set because the league trends each season — pass rates, kicker accuracy, hash-mark rules — and a random shuffle gives the model peeks at outcomes it would not have at bet time.
Step 2: Compute opponent-adjusted Elo and EPA
Three features carry most of the predictive weight beyond the closing line itself:
- Net yards per play differential. Offensive yards per play minus defensive yards per play allowed, last eight games, opponent-adjusted. Yards is noisier than EPA but more transparent and almost as predictive.
- Net EPA per play. Expected Points Added on offense minus EPA allowed on defense. Pull from nflverse pre-computed for free. Trailing eight games, opponent-adjusted, dropping garbage time (win probability inside 5-95%).
- Opponent-adjusted Elo. Start each team at 1500, add a home-field bump of about 1.7 points, update after each game with K = 20. Convert the Elo gap to a point-spread expectation with the standard 25-point ratio (a 25 Elo gap implies a 1-point spread).
These three plus the closing spread, starting QB availability, rest differential, and travel distance form a seven-feature baseline that closes most of the gap to a sharp number. The closing line itself is the dominant feature — it aggregates everyone else's models. Your job is to find the small marginal lift that comes from the other six. Open the Shark Snip builder to wire these features into a blueprint visually.
Step 3: Train and evaluate
Train any of: a ridge regression, a gradient-boosted tree, or a small neural net. For 1,800 games and seven features, ridge wins on simplicity and rarely loses on accuracy versus a properly regularized GBM. Evaluate on RMSE and on against-the-spread cover rate. Healthy numbers for an NFL margin model on a recent test season:
- Validation RMSE: 12.5 to 14 points
- ATS cover rate: 51% to 54% on a 200-game test season
- Mean absolute error vs closing line: under 4 points
If your test cover rate is above 56% on a small sample, suspect data leakage before celebrating. The most common leak is rolling stats that include the current game in the average — use only the prior eight games, lagged by one. The injury impact study covers another classic leak: announcing inactives before the close.
Building a totals model
Totals models look easier because both teams contribute, but they are sneakier. Pace, neutral pass rate, and weather drive totals more than raw scoring efficiency, and the closing total absorbs less information from the public than the closing spread does. That makes the totals market both more beatable and more punishing when you are wrong.
Pace and neutral pass rate
Pace is plays per minute when the game is competitive — exclude two-minute drills and garbage time. Neutral pass rate is the share of plays that are passes when win probability sits between 25% and 75%. These two features capture how much volume each offense will generate, separate from how efficient that volume is. The same offense at 70 plays per game produces dramatically more variance in totals than at 60.
Weather and venue
Weather is binary-ish: dome games are weatherproof, outdoor games face wind first, then rain, then temperature. Wind above 15 mph drops expected total points by roughly 3-5 depending on the matchup. Rain shifts run-pass mix and lowers efficiency. Cold under 25°F has a smaller measurable effect than the popular narrative suggests. Pull current conditions from a free weather API at bet time, not the historical average for the venue.
Divisional dampening
Divisional rematches in November and December tend to score under their projection by 1-3 points. The simple explanation is film: both coordinators have studied each other twice, surprise plays disappear, and execution gets cleaner on defense than on offense. Code this as a flag: is_divisional AND week >= 10, then let the model decide the magnitude. The totals deep dive walks through the historical splits in detail.
Building a player prop model
Props are where the modeling work pays off most. The book posts hundreds of lines per week, the soft ones get shaded but rarely killed, and a calibrated projection beats a feel pick by a wider margin than on game lines. The trick is building from features upward rather than guessing at outputs.
Features start with usage
Stat projections are usage times efficiency. Both halves matter, but usage is more stable week to week and is what you should anchor your model on. The four core usage features for skill-position props:
- Snap percentage — share of offensive snaps the player is on the field. The ceiling on every counting stat.
- Route share — share of pass plays where the player ran a route. For receivers and tight ends.
- Target share — share of team targets the player drew. Combine with team pass volume for a target projection.
- Carry share — for running backs, share of team rush attempts. Combine with red-zone carry share for TD modeling.
The target share vs air yards study walks through which of these stabilizes fastest after a role change — usage features can shift by 15+ percentage points in the first three games after an injury or trade, so freshness matters.
Layer aDOT and route mix on top
Average Depth of Target (aDOT) and route mix turn target volume into yardage. A 7-target receiver running deep posts (aDOT 18) projects to ~120 yards; a 7-target slot receiver running drags (aDOT 5) projects to ~50. The nflverse weekly receiver charts publish per-game aDOT and route distribution for free.
Build the projection
For receiving yards, the simple model is: projected_yards = team_pass_attempts * target_share * catch_rate * yards_per_reception. Train each multiplier separately on rolling samples — team pass attempts is a function of opponent pace and game script, target share is a player rate, catch rate and yards-per-reception are matchup-adjusted player rates. Then combine and add a residual normal distribution for variance. This separates which part of the model is wrong when the projection misses, instead of leaving you with a single opaque output.
From margin to cover probability — the math that matters
A margin prediction is not a bet recommendation until it becomes a probability. The standard move:
- Predict margin as a number, e.g.,
predicted_margin = +5.2means home wins by 5.2 on average. - Compute the residual:
edge = predicted_margin - closing_spread. If the closing spread is +3 (home favored by 3) and you predict +5.2, your edge is 2.2 points. - Treat that edge as the mean of a normal distribution with standard deviation matching the empirical residual RMSE — about 13.5 points for NFL game margins on a well-built model.
- Cover probability =
1 - normalCdf(0, mean=edge, sd=13.5). For an edge of 2.2 points: cover probability is 56.5%.
That feels small. The reason it feels small is that the public sees a 5.2 prediction against a 3 line and assumes a huge edge. The actual edge is 56.5% — meaningful, but not enough for a max bet. The standard deviation of NFL game margins is wide because football is high-variance. A two-point projection edge becomes roughly a 4-5 percentage point cover edge, not the 10+ that an unrigorous gut would assume.
Calibration: what 60% should mean
Calibration is the test of whether your probabilities mean what they say. A 60% confidence bet should win 60% of the time over a large sample. Three tools:
- Brier score. Mean squared error between predicted probability and actual outcome (1 or 0). Lower is better. A coin flip scores 0.25; anything under 0.245 on NFL spreads is real signal.
- Log-loss. Punishes confident wrong predictions harder than Brier. More sensitive to calibration drift at the extremes.
- Reliability diagram. Bin predictions (e.g., 50-52%, 52-55%, 55-58%, 58-62%, 62%+) and plot the actual cover rate in each bucket. A perfectly calibrated model lies on the diagonal.
The reliability diagram above shows what a healthy spread model produces — predicted and actual within a percentage point at every bucket, sample sizes that justify the points. When the gap exceeds three points consistently in one direction, the model is biased; when it widens at one bucket only, that feature combination is broken. Either way, fix calibration before adding new features.
From model to bet: vig removal, Kelly, and line shopping
A calibrated probability is still not a bet. Three more steps separate a number from a sized wager.
Vig removal
Books post both sides with built-in margin. A standard NFL spread is -110/-110, which converts to implied probabilities of 52.38% and 52.38% — sum 104.76%, with the 4.76% being the hold. To get the no-vig probability, divide each implied probability by the sum: 52.38 / 104.76 = 50% on each side. The fair line is 50/50. Your edge is your model's probability minus the no-vig probability — not minus the raw implied probability, which would overstate your edge by the hold percentage.
Player props carry more vig — 4-8% on standard yardage and reception props, and over 10% on alternate lines. The sharp vs public guide walks through why prop hold is higher and how it varies by book.
Kelly fraction
The full Kelly formula for a -110 bet with probability p: stake_fraction = (p * 110 - (1 - p) * 100) / 110. For a 54% true probability, full Kelly is about 3.6% of bankroll. Almost no one bets full Kelly — variance is brutal, and any error in the probability estimate compounds. Quarter Kelly (0.9% of bankroll for a 54% bet) gives most of the long-run growth with much less drawdown.
The discipline trap: a model that claims 60% on every bet but is actually 53% calibrated will recommend Kelly stakes that bankrupt the bettor. Calibration matters more than raw cover rate for sizing decisions — see the FAQ entry on calibration above.
Line shopping
The same NFL spread varies across legal books by a half to a full point. On key numbers (3, 7, 10) a half-point of line value is worth 1.5-2% on win rate. Three or four legal books in your state captures most available shopping value. ESPN's scoreboard shows consensus lines for free; the actual best price requires checking the books themselves at peak market hours (Wednesday afternoon and Saturday morning are typical inflection points).
Backtesting honestly
The most common modeling failure is overfitting to a backtest. Three rules to keep yourself honest:
- Walk-forward only. Train on weeks 1-8, predict weeks 9-16, slide one week, repeat. A random split leaks information.
- Hold out a final test set you never look at until the end. Pick the 2025 season, run it once. If you tune to it, it stops being a test set.
- Charge real juice. -110 standard, -115 to -120 on alt lines, 5-8% on props. A backtest at zero juice is a fantasy.
The sharp vs public piece is worth re-reading after your first backtest — sharp money tends to enter the market mid-week, and a model that beat the opening line on Sunday's close has lower margin than the raw cover rate suggests.
Building all of this in your browser
Every step above runs in the Shark Snip Workshop. The blueprint editor lets you drag the seven baseline spread features into a model, train with TensorFlow.js (no server, no GPU bill), backtest with walk-forward splits, view calibration with a reliability diagram, and publish to the live picks pages. Specific entry points:
- Open the Workshop for the guided builder with topic presets for NFL spread, NFL total, and NFL player props.
- Open /build for the lower-level brick editor where you can add custom features.
- Compare your output to other published models on the leaderboards — same training data, same evaluation harness, transparent live cover rates.
- Once you trust your model, list it on the marketplace so other users can subscribe to its picks.
- For the live front-end of NFL picks generated by published models, check /gridiron for the current week's slate with model edges shown.
What good looks like after one season
A first NFL spread model trained on the seven baseline features should land near these numbers after a full season of live betting:
- Live ATS cover rate: 51-53% over 200+ bets
- Brier score under 0.245 on the same sample
- Calibration drift under three percentage points at every reliability bucket
- ROI per bet between -1% and +3% at standard juice
Those numbers feel modest because they are. Sustained 53% ATS at -110 is roughly a 1.2% edge per bet, which compounds to meaningful returns over thousands of bets but never produces the 60%+ headline results that tout services advertise. The honest number is the durable one. If your live cover rate diverges from the backtest by three or more percentage points after 100 bets, retrain on the most recent season — the market or the league has shifted under you.
Where to go next
Once a baseline NFL spread model is live, the most valuable extensions are:
- A totals model with the pace, neutral pass rate, weather, and divisional features described above. Cross-check it against your spread model — if both like the home team to score 30, your projection is consistent.
- A player prop model anchored on snap, route, target, and carry share. Player props are higher variance per bet but lower correlation across bets, so they diversify a portfolio that already has spread and total exposure.
- Fractional Kelly sizing applied to all three model outputs, with a portfolio-level cap so no single game exceeds a chosen percentage of bankroll.
The target share study is the right next read for prop modeling depth. The injury impact study is essential for handling late-week status changes that move markets and your model. The spread mechanics primer and the totals deep dive are companion pieces in this NFL markets cluster.
A note on responsibility
Modeling does not eliminate variance. A 53% true edge can produce a 100-bet drawdown of 15+ units. Bankroll sizing matters more than feature engineering. Set a hard maximum stake, never chase, and treat any month where you exceed your limits as a loss regardless of P&L. The model is a tool; the bettor is the risk manager. Bet only what you can afford to lose, and use legal regulated books in your state.
Props and DFS example board
For props, DFS, and PrizePicks-style decisions, the names should reveal the input. Jokic assists, Shai points, Wembanyama blocks, Josh Allen rushing, Ja'Marr Chase receptions, and Christian McCaffrey touchdown equity all require different checks. Treat each player as a role-and-price puzzle rather than a logo on a pick card.
- Fixed-line check: compare the app line to sportsbook consensus before calling it an edge.
- Correlation check: do not pair legs that require opposite game scripts.
- DFS check: salary, ownership, and late-swap flexibility can matter as much as median projection.
- Tracking check: grade closing value and result separately so a lucky hit does not hide a bad line.
Props workflow links
Use PrizePicks basics, NFL player props, and correlation math as the internal loop from projection to price to risk control.
Prop, DFS, and contest examples
Use names as evidence, not decoration. The useful SEO win is that Josh Allen, Ja'Marr Chase, Bijan Robinson and Puka Nacua and Eagles, Chiefs, Bills and Lions appear inside decisions, thresholds, and internal links instead of being dumped into a keyword list.
- Prop EV example: if Amon-Ra St. Brown receptions are 6.5 at -120, a model median of 7.1 with a 56% over probability creates a fair threshold near -127; pass if the market jumps to 7.5 without a projection change.
- DFS value example: projection divided by salary times 1,000 keeps the slate honest. A 20.4-point projection at $7,200 is 2.83x median value; tournaments need ceiling, leverage, and correlation on top of that.
- Stack example: Patrick Mahomes with Travis Kelce and Xavier Worthy needs a bring-back plan from the opponent; Josh Allen with Keon Coleman and Dalton Kincaid needs rushing-TD cannibalization in the script notes.
- PrizePicks example: Nikola Jokic rebounds, Devin Booker points, and Stephen Curry threes should not be treated as one generic “More” card; legs need hit rate, payout, and correlation checks.
The next step should be a tool, not another opinion: compare the line on NFL player props, pressure-test salary in DFS tools, and log the close with bet tracking.
Research note board
Use this board before clicking a prop, DFS build, or same-game entry. The table is intentionally about thresholds, not fake certainty.
| Step | Input | Example application | Cancel rule |
|---|---|---|---|
| Project the role | Snaps, routes, targets, carries, minutes, or usage | Josh Allen volume against the posted line | The player loses the role that created the projection |
| Price the market | Break-even odds, line shopping, hold, payout structure | vig compared with sportsbook consensus | Juice or line movement removes the edge |
| Check correlation | Game script, teammate overlap, ownership, late news | Ja'Marr Chase paired with Eagles script notes | The legs need different games to happen |
Model calibration: predicted vs observed
Predicted win probability bucket vs the empirical win rate inside that bucket on the test set. Points on the y=x reference line are perfectly calibrated; points below mean the model is overconfident in that bucket.
Prop OVER hit rate vs line distance from median
Empirical hit rate of OVER bets as the prop line moves away from the player projection median, measured in standard deviations. A line set 1sd below the median hits ~84% of the time — but books price the juice to match.



