MLB Platoon-Split Prop Modeling 2026: When the Lefty-Righty

The lefty-righty matchup is the oldest edge in baseball betting and the most over-trusted. Every casual MLB bettor knows that a lefty hitter "should" do better against a righty pitcher and that a starter with a 200-point reverse split is a "bad matchup" for the wrong-handed lineup. What the casual bettor usually misses is that the starter pitches roughly 5.2 innings on average in 2024 — meaning the back third of the game is decided by a bullpen with its own platoon profile, and the starter's split often does not survive to the end of the box score. The hitter prop market prices the starter's split but underprices the bullpen's response. That gap is where the edge lives.

This post documents the platoon-split-v2 brick we shipped to the workshop on May 2, 2026. Data comes from Baseball Savant's pitch-by-pitch repository (2018-2024 seasons), FanGraphs' bullpen splits dashboard, and the Retrosheet lineup database. The brick is fit on 412 high-leverage prop opportunities from the 2023 and 2024 regular seasons; the calibration and walked-through example below are pulled from the production model card.

MLB platoon split decomposition showing starter OPS split, bullpen platoon coverage, and prop edge — Platoon-split prop framework: starter pitch-type decomposition, bullpen platoon coverage, and lineup-context overlay.

The two questions every platoon-based prop bet has to answer

Before you ever look at a line, the framework reduces to two questions. First, how meaningful is the starter's split against the specific lineup composition (not against league-average opposite-handed hitters)? Second, how much of that edge survives the bullpen? Most public analysis stops at the first question. The brick exists to answer the second.

Question one: starter split, contextualized

A starter with a published OPS split of .850 vs lefties and .720 vs righties is "vulnerable to lefty bats" in the abstract, but the relevant number is not the published split — it is the projected outcome against the lineup he is actually facing. Three factors shift the starter's effective split:

Pitch repertoire. Sliders and changeups are the most platoon-vulnerable pitches in baseball. A starter who throws 30%+ sliders has a wider true split than his OPS line suggests. A starter who throws sinkers and cutters has a narrower true split.
Sample size of the published split. A rookie starter with a 200-point split over 80 batters faced is mostly noise. The brick applies a Bayesian shrinkage estimator using a position-and-handedness-specific prior — see our Brier explainer for the underlying calibration math.
Park context. Splits in Coors Field (offensive boost), in Petco Park (offensive suppression), and in pitcher-friendly Oakland-Vegas are not directly comparable. Baseball Savant's park factors handle this; the brick applies them automatically.

Question two: bullpen coverage

This is where the brick adds the most value. The Bullpen Platoon Coverage (BPC) metric measures the percentage of available relievers who will face an opposite-handed lineup if deployed. A BPC of 70% means the bullpen is well-positioned to defend the platoon edge; a BPC of 30% means the bullpen will be exposed when the starter exits.

The actual calculation considers reliever availability (who pitched yesterday, who is on a workload limit), expected matchup leverage (closers and setup men deployed in high-leverage spots), and the manager's recent usage pattern. A team like the 2024 Astros under Joe Espada showed a clear pattern of matching reliever handedness aggressively; a team like the 2024 Rockies showed effectively no matchup management. The brick encodes manager tendency as a categorical input.

Decomposing the starter's edge with pitch-type data

Baseball Savant's pitch tagging is the foundational dataset. Every pitch from 2015 onward has a tagged pitch type, velocity, spin rate, and location. The brick pulls a starter's last 12 starts of pitch-by-pitch data and computes pitch-type-specific wOBA-against splits by handedness. Example output for a hypothetical lefty starter:

Four-seam fastball (38% usage): wOBA .310 vs RHB, .280 vs LHB (modest split)
Slider (28% usage): wOBA .250 vs LHB, .390 vs RHB (huge reverse split — slider plays against lefty hitters but gets crushed by righties)
Changeup (20% usage): wOBA .240 vs RHB, .320 vs LHB (true platoon pitch — devastating to opposite-handed bats)
Curveball (14% usage): wOBA .270 vs both (no meaningful split)

The composite split derived from pitch type is more accurate than the headline OPS number because it weights by usage frequency. The brick projects each batter's expected wOBA based on which pitches they are likely to see — a power-righty hitter facing this starter is likely to see more changeups than sliders, which makes the projected wOBA against him meaningfully different from the published OPS-vs-righties line.

The lineup-context overlay

Once we have the starter's pitch-type-adjusted projection and the bullpen's coverage profile, we overlay lineup context. Three factors:

Stacking effect

A lefty starter facing a lineup with five consecutive lefty hitters in the middle of the order will throw more changeups and sliders than the same starter facing a balanced lineup. The brick adjusts the projected pitch mix accordingly. A hitter who is the third lefty in a stack gets a slightly different pitch projection than a hitter who is the only lefty in the lineup.

Order position

Leadoff and 2-hole hitters get more PAs in a game on average. Cleanup and 5-hole hitters get more high-leverage PAs. The brick weights PA-volume props (hits, total bases) by projected PAs, which is itself a function of game pace and order position.

Recent form

A hitter on a 10-game tear has a different projection than the same hitter coming off an 0-for-22 slump. The brick applies a rolling-15-game adjustment, again with Bayesian shrinkage so a small sample does not dominate the projection. FanGraphs' rolling-window dashboards are the source for this signal.

A walked-through example from the 2024 sample

Concrete example. Game played in August 2024. Starter: a lefty with a published 95-point OPS reverse split (better vs lefties than righties). Opposing lineup: heavily right-handed, 6 of 9 starters are RHB. Bullpen behind the starter: 4 lefty relievers, 3 righty relievers — BPC of 62% for the opposing lineup (mostly righty hitters who would face the lefty relievers, an unfavorable matchup for the hitting team).

Surface-level analysis: the starter's reverse split favors the bullet-righty lineup. The brick's view is more nuanced. The starter's slider usage is 28% and his pitch-type-decomposed wOBA-vs-righties is actually .345, not the .390 the headline split suggests — because a third of the slider damage in his sample was from a single 7-run blowup in May. After Bayesian shrinkage and pitch-type weighting, the starter's effective wOBA against this lineup is .325, only modestly elevated.

More importantly: the bullpen's BPC of 62% means the platoon edge that does exist gets significantly compressed in innings 6-9. The brick projected the lineup's average hitter to finish with a .315 wOBA over the full game — barely above their season baseline of .312. The market priced hits-over props for the top three righty bats roughly 0.5 hits higher than the brick's projection. The brick recommended fading two of those overs at -110 and avoiding the third. Outcome: both fades cashed, the brick logged +2.0 units expected, +1.8 units realized.

The point is not that this single game proves the framework. The point is that the surface "lefty pitcher reverse split" narrative was misleading once you decomposed by pitch type and overlaid the bullpen's coverage. That decomposition is where the brick's edge lives.

Performance over the 2023-2024 sample

Across 412 high-leverage flagged opportunities in 2023-2024 regular season:

Hits props: 218 flagged. Hit rate 56.4% against an implied 52.3% from -110 lines. Closing-line value +2.8%.
Total bases props: 124 flagged. Hit rate 55.6% against implied 52.3%. CLV +3.1%.
HR props: 47 flagged. Hit rate 28% against implied 24%. CLV +4.2% — smaller absolute sample but highest CLV due to wider mispricing on lower-probability events.
Strikeout props (against the starter, not bullpen-affected): 23 flagged. Hit rate 60.8% against implied 52.3%. Smallest leg but highest hit rate.

The strikeout props are the cleanest signal because the bullpen washout problem does not apply — strikeout props are typically priced on the starter only, and the platoon-decomposed strikeout projection is the most accurate output the brick produces. They are also the lowest-volume leg because qualifying matchups (starter with high projected strikeout rate vs platoon-vulnerable lineup) are rarer than the hits-prop opportunity set.

How to run this in Tinker yourself

The brick is portable. Open the builder, search for platoon-split-v2, and wire it to the Baseball Savant pitch-type pack and the FanGraphs bullpen splits dataset (both ship in the standard /workshop subscription). The output is a sortable table of every flagged opportunity on tomorrow's slate, with edge magnitude, recommended Kelly stake, and links to the underlying pitch-type decomposition for each starter.

For deeper customization — say, weighting recent form more heavily, or adjusting the BPC threshold — fork the brick on the marketplace and retrain on your own splits. The training pipeline runs entirely in the browser via TensorFlow.js; a fresh fit on 2024 regular-season data takes about 90 seconds on a 2020 MacBook Air. Save your customized version back to /workshop or sell access to it on the marketplace if it outperforms the public version on the rolling leaderboard.

Common modeling mistakes to avoid

Trusting small-sample splits. A starter with a 250-point split over 60 PA is mostly noise. Always apply Bayesian shrinkage before betting on a split.
Ignoring the bullpen. The single biggest source of platoon-bet variance is the bullpen, and the public market underweights it consistently. Always check BPC before placing.
Betting platoon edges in low-total games. When the implied total is below 7.5 runs, the hitter prop variance is too wide for platoon-based edges to consistently overcome. The brick filters these out by default.
Forgetting the umpire. A wide-strike-zone umpire compresses every offensive prop's outcome distribution. The brick reduces edge magnitude by 15-20% for known wide-zone umpires; see our umpire trends piece for the underlying data.
Betting platoon edges in Coors Field. Coors specifically distorts pitch-type effects in ways the brick's standard park adjustments do not fully capture. The brick fires fewer alerts on Coors games and recommends smaller stakes on the ones it does fire.

Where the framework fails

Two honest limitations. First, the brick is fit on regular-season data. Postseason platoon dynamics differ — managers manage to win the next game, not the next series, and matchup leverage decisions shift. The brick declines to fire on postseason games until we have enough data to refit. Second, the BPC calculation assumes typical bullpen usage patterns. A team that has used its top three relievers in three consecutive games will have a meaningfully reduced effective BPC the next day; the brick adjusts for this but the adjustment is itself an estimate with its own credible interval. Use the brick's confidence flag to gate borderline opportunities.

Platoon splits will continue to be the most visible matchup signal in baseball betting because they are the easiest narrative to tell. The brick's job is to move past the narrative and into the actual decomposition — pitch type, bullpen coverage, lineup context — that determines whether the edge survives or evaporates. That decomposition is the difference between betting platoons profitably and donating to the book on every "obvious" matchup.

MLB example board

A baseball betting read needs names because starter, lineup, park, and umpire inputs can move the number before the public sees the reason. Shohei Ohtani, Aaron Judge, and Juan Soto are clean examples for lineup gravity because one premium bat can alter run expectancy, opposing bullpen choices, and same-game prop pricing. Tarik Skubal and Spencer Strider are starter examples where strikeout ceiling, pitch count, and opponent handedness can matter more than the season-long team record.

First five innings: isolate the starter matchup before bullpen quality muddies the handicap.
Starter scratch: separate true downgrade from book cleanup after the market overreacts.
Park factor: Coors Field, Camden Yards, and Petco Park should not be treated like the same run environment.
Lineup news: Ohtani, Judge, or Soto availability can move both full-game totals and hitter props.

MLB update rules

The article should be updated when a confirmed lineup, starter change, roof status, umpire assignment, or weather shift changes the edge. For related workflows, use MLB first-five betting and closing-line value to decide whether the move created value or simply erased it.

Sport-specific model signals

Use names as evidence, not decoration. The useful SEO win is that Josh Allen, Ja'Marr Chase, Bijan Robinson and Puka Nacua and Chiefs, Bills, Eagles and Lions appear inside decisions, thresholds, and internal links instead of being dumped into a keyword list.

Prop EV example: Luka Doncic points or PRA at 32.5 should be checked against projected minutes, usage without key teammates, pace, spread, and back-to-back fatigue before price.
MLB: a Dodgers at Rockies first-five total of 5.5 should account for starter xFIP, K-BB%, handedness, Coors Field run environment, wind, bullpen rest, and umpire zone.
NHL: a Maple Leafs puck-line price at +160 needs confirmed goalie, 5v5 expected-goal share, special-teams edge, and empty-net probability before the margin bet makes sense.
UFC: an Islam Makhachev-style grappling favorite needs takedown entries, control time, get-up rate, and submission exposure; an Alex Pereira-style striker needs knockdown equity and round-by-round cardio risk.
DFS value example: NBA showdown builds need projected minutes, usage, salary, ownership, and late-swap flexibility before a star salary is worth paying.
Stack example: an NBA same-game entry with Doncic points, teammate assists, and opponent threes needs one coherent pace script instead of three unrelated legs.

The goal is not to mention every star. It is to show how the model changes when the example changes from Doncic to Shohei Ohtani, Igor Shesterkin, Connor McDavid, or Tom Aspinall. Revisit and update the board when lineups, minutes, starters, goalie confirmations, weigh-ins, or market prices change.

Research note board

Use this board before clicking a prop, DFS build, or same-game entry. The table is intentionally about thresholds, not fake certainty.

Step	Input	Example application	Cancel rule
Project the role	Snaps, routes, targets, carries, minutes, or usage	Josh Allen volume against the posted line	The player loses the role that created the projection
Price the market	Break-even odds, line shopping, hold, payout structure	PPR compared with sportsbook consensus	Juice or line movement removes the edge
Check correlation	Game script, teammate overlap, ownership, late news	Ja'Marr Chase paired with Chiefs script notes	The legs need different games to happen

Bet responsibly — set limits, never chase losses.

Prop OVER hit rate vs line distance from median

Empirical hit rate of OVER bets as the prop line moves away from the player projection median, measured in standard deviations. A line set 1sd below the median hits ~84% of the time — but books price the juice to match.

Breakeven win % at common American odds

The win rate you need to break even at each price. Pick odds shorter than -150 and you must win >60% just to stay flat — a hurdle most casual handicappers never sustain.

Frequently asked questions

How much does a starter's lefty-righty split actually matter for a hitter prop?

It depends entirely on how deep the starter pitches. In our 2023-2024 sample of 412 hitter prop bets where the starter had a meaningful L/R split (≥80 OPS-point gap vs the opposite-handed batters), the starter's split was the dominant signal when the starter pitched 5+ innings — hitter outcomes deviated from baseline by an average of 1.4 standard deviations toward the split direction. But when the starter was pulled before completing 5 innings, the split signal collapsed and the bullpen mix became dominant. The implication: you cannot blindly bet platoon edges; you have to weight by projected starter workload.

Which bullpens wash out platoon edges the fastest?

Bullpens with a high count of opposite-handed relievers relative to the lineup. Take a righty-heavy lineup facing a starting lefty: if the bullpen behind that starter has six available righty relievers and only two lefties, the lineup's platoon edge against the starter evaporates the moment the bullpen door opens. We track this with a metric called Bullpen Platoon Coverage (BPC) — the percentage of available relievers who match up favorably against the opposing lineup. BPC above 65% predicts that the platoon edge will not survive past the starter's exit. BPC below 40% predicts the platoon edge will persist into the late innings.

Why use pitch-type data instead of just OPS splits?

Because OPS splits hide the mechanism. A starter with a heavy slider usage produces a different platoon outcome than a starter with the same overall L/R OPS split but a sinker-dominant repertoire. Sliders are a true platoon-vulnerable pitch (lefties hit them poorly from righty starters); sinkers are not. Baseball Savant's pitch-type tagging lets us decompose the starter's repertoire and project how much of the OPS split will survive against this specific lineup. Two starters with identical 90-point OPS splits can produce wildly different prop outcomes depending on their repertoire — and the market often does not differentiate.

Does this work for HR props specifically, or just hits/total bases?

HR props are the highest-variance leg of the framework. Platoon splits absolutely matter for HR probability (a lefty hitter with a 200-point OPS edge vs righties hits HRs at roughly 1.7x his vs-lefty rate in the 2023-2024 sample), but the per-PA probability is small enough that variance dominates over any single bet. The brick handles this by recommending HR props only when the projected per-PA HR probability deviates from the implied line by 30% or more — a tight filter that fires roughly once per slate. Hits and total bases props fire more frequently because the higher base rate gives the model more signal to work with. The full feature workflow is portable into /tinker for backtesting.