Consensus Methodology

How Shark Snips Consensus is computed and weighted across hundreds of opted-in models.

Sharksnip Consensus — Methodology

A complete documentation of how Shark Snips Consensus is computed, weighted, and verified.

This document is public methodology by design. We believe a Consensus product is only credible if its methodology is fully open, replicable, and stress-tested by skeptical users.

TL;DR for the degens

  • Consensus = Brier-weighted vote of every opted-in model. Calibrated models get more pull, bad ones get zeroed out.
  • Verified Sharp (top 10% calibration, 500+ picks, 90+ days) gets weight. New / Pack Breaker submissions enter at 0.25×.
  • No single creator > 5% of total weight. Ever. Forks that vote together get deduped (correlation > 0.95 = one combined vote).
  • Brier ≥ 0.50 = excluded. Worse than a coin flip and you're out of Consensus, period.
  • Free sees the top pick. Grinder sees full feed + 30-day track record. God Mode sees market divergence + custom slicing + API.
  • We publish replication code and never sell Consensus to sportsbooks.

Backtest error field — projection vs actual, mean absolute error by market

Every prediction logged. Every miss logged. The track record is the methodology.


What Consensus is

Sharksnip Consensus is a weighted aggregate of predictions made by every opted-in marketplace model on the platform AND every opted-in Pack Breaker tournament submission. For each upcoming game in supported sports, we compute:

  • Consensus probability — what the weighted aggregate of models thinks
  • Consensus pick — the implied recommendation (cover spread, take the over, etc.)
  • Dispersion — how much the constituent models disagree
  • Market divergence — how much the Consensus diverges from the Vegas line

The product is updated as new model predictions arrive (typically multiple updates per day in-season).


Why weighted, not averaged

A naive average treats a 95%-accurate model the same as a 50%-accurate model. The result is noise-dominated. We use Bayesian Brier-score weighting so that:

  • Better-calibrated models contribute more
  • New / unproven models contribute less
  • Bad models are excluded entirely

This is structurally similar to how respected poll aggregators (FiveThirtyEight, RealClearPolitics) weight pollsters — give more weight to those with track records.


Eligibility — what contributes

Two source types contribute to Consensus:

Source type 1: Marketplace models

A marketplace model contributes if all of the following are true:

  1. The model is published on the Sharksnip Marketplace
  2. The creator has opted in to Consensus inclusion (required for marketplace listing)
  3. The model is currently active (not unpublished or archived)
  4. The model has minimum sample size for its contribution tier (see Tier Weighting below)
  5. The model has Brier score above 0.50 (excluded otherwise — see Excluded tier)

Source type 2: Pack Breaker tournament submissions

A tournament submission contributes if:

  1. The user opted in to Consensus contribution at submission time (default ON)
  2. The tournament has resolved (predictions are scored against actual outcomes)
  3. The submission's user has not been excluded for fraud / cheating

Pack Breaker submissions enter at Probationary weight (0.25×) by default unless the submitter has a Verified Sharp or Established marketplace model in the same sport, in which case the submission inherits that creator's contribution tier (capped at the per-creator 5% weight limit).

Private models — models trained but not published, with no Pack Breaker submission — do not contribute. This is by design: creators choose between earning from a public model or keeping a private edge.


Tier weighting

Each contributing source is assigned a tier weight based on its track record. These are Consensus contribution tiers — performance grades, not subscription tiers. Don't confuse them with Free / Slate Pass / Grinder / God Mode.

Tier Weight Eligibility criteria
Verified Sharp 5.0× 90+ days live, top 10% calibration in its sport, 500+ predictions
Established 2.0× 30+ days live, top 30% calibration in its sport, 100+ predictions
Standard 1.0× 14+ days live, 50+ predictions, Brier ≥ 0.50
Probationary 0.25× New models still building track record AND most Pack Breaker submissions
Excluded 0.0× Brier < 0.50 (worse than coin flip on calibrated bets)

Tier assignments update daily based on rolling 90-day performance windows. A model that drops below criteria for its current tier is downgraded after a 14-day grace period (preventing single-week noise from triggering rapid tier changes).

A user with Featured Creator status (editorial recognition awarded by the Sharksnip team) does NOT automatically get higher Consensus weighting — Featured Creator is about marketplace discovery + editorial visibility, not Consensus mechanics. The Consensus tier ladder is purely performance-driven and computed from track record.


Single-creator weight cap

To prevent a single popular creator from dominating Consensus, no individual creator's models can collectively contribute more than 5% of total Consensus weight for any given prediction.

If a creator has multiple models that would collectively exceed 5%:

  • Their models are weighted normally up to 5% combined
  • Beyond 5%, additional models are zeroed out for that prediction
  • The creator chooses which models retain weight via their dashboard

This caps "Sharksnip Consensus is just TopCreator's view" failure modes.


Forked-model deduplication

Forked models can correlate heavily (similar architecture + features → similar predictions). If 30 forks of a top model all vote the same way, that's not 30 independent votes — it's effectively one.

We detect correlation via:

  • Architecture similarity hash — exact match → high prior on correlation
  • Feature pipeline overlap — Jaccard similarity on declared features
  • Prediction correlation — Pearson correlation > 0.95 on rolling 100 predictions

Models with detected correlation > 0.95 are clustered. The cluster receives a single combined weight, capped at the 5%-per-creator limit. Within the cluster, weight is allocated proportionally to each fork's individual track record.

This is the same statistical principle that poll aggregators apply to handle correlated polling errors.


Brier score (the calibration metric)

We use Brier score as our primary calibration metric. For probabilistic predictions:

Brier = (1/N) × Σ (p_i - o_i)²

Where:

  • p_i is the model's predicted probability for outcome i
  • o_i is the actual outcome (1 if it happened, 0 if not)
  • N is the number of predictions

Lower is better. A perfect model: 0.0. A coin flip on 50/50 events: 0.25. Random guessing: 0.5+.

For binary sports outcomes (cover/no cover, over/under, win/loss), Brier translates to:

  • 0.10-0.15 = elite calibration
  • 0.20-0.22 = good
  • 0.24-0.25 = baseline (Vegas-equivalent)
  • 0.27+ = poor (excluded from Consensus)

ROI tracking (the profitability metric)

Brier measures calibration; ROI measures whether the model would actually make money at typical sportsbook prices.

ROI = (Σ winnings - Σ stakes) / Σ stakes

We compute ROI assuming:

  • Flat unit stake on every recommended bet
  • Best-available price across major sportsbooks at time of prediction (not closing)
  • -110 standard juice for spread/total bets
  • Posted moneyline odds for ML bets

A model can have great Brier but poor ROI (well-calibrated but rarely takes the +EV side) or great ROI but poor Brier (got lucky in a small sample). We track both.

For Consensus weighting purposes, Brier dominates because a well-calibrated model is genuinely informative regardless of whether it picks bettable lines. ROI is informational but not weight-determining.


Aggregation formula

For each prediction, the Consensus probability is:

p_consensus = Σ (w_i × p_i) / Σ w_i

Where:

  • w_i is the tier weight of model i after correlation adjustment and creator cap
  • p_i is model i's predicted probability

Dispersion is the weighted standard deviation:

dispersion = sqrt(Σ w_i × (p_i - p_consensus)² / Σ w_i)

Market divergence is:

divergence = p_consensus - p_market

Where p_market is the Vegas-implied probability from the closing line (or current line for pre-game updates).


Tier views (what's visible at each subscription level)

Output Free / Slate Pass Grinder God Mode
Top consensus pick per slate
Full consensus, all games
Per-game probability
Number of contributing sources
Recent track record (30d)
Full historical track record
Dispersion
Market divergence
Custom slicing (filter by creator, weight tier, sport, market type, etc.)
Per-creator contribution breakdown
API access ✅ (rate-limited)

What we DO NOT publish

  • Identity of contributing creators (unless they've opted in to attribution)
  • Individual model probabilities (the inputs to the weighted aggregate)
  • Proprietary feature engineering (creators retain that)

Verification

The Consensus track record is publicly visible at sharksnip.com/consensus/track-record:

  • Daily and weekly Brier score
  • Daily and weekly ROI
  • Hit rate by sport, by market, by confidence level
  • Comparison to baseline (Vegas closing line)
  • Comparison to top individual models in the marketplace

We do not curate or cherry-pick the displayed track record. Bad weeks show up. The methodology is open, the inputs are open, and the outputs are public. The only way to fake a Consensus track record would be to manipulate the underlying model contributions, which is detectable via the public per-creator stats.


Independent verification

We support and encourage independent academic and journalistic verification:

  • Researchers can apply for read-only API access (no rate limit, full historical) at research@sharksnip.com
  • Anonymized contributing-model data is available upon request for academic research
  • We've published replication code and weighting algorithms at github.com/sharksnip/consensus-methodology (placeholder — actual repo TBD)

What can break Consensus

We list these openly because we want users to know the limits:

  1. Sample size during ramp-up. If only 50 models are opted in, the 5%-per-creator cap is essentially meaningless and idiosyncratic creators have outsized influence. Mitigation: Probationary tier weighting + minimum-sample requirements + Pack Breaker tournament submissions broaden the contributor pool.

  2. Coordinated forking attacks. A bad actor could publish many forks of a model, hoping to inflate its weight. Mitigation: forked-model correlation detection + creator weight cap + prediction correlation analysis.

  3. Adversarial models. A creator could publish a model designed to look good in calibration but encode systematic bias. Mitigation: live track record + audit trail + Brier vs ROI cross-check.

  4. Market arbitrage. If Consensus becomes widely-followed, it influences the markets it's predicting. Alpha decays. Mitigation: this affects all betting analytics; mitigation is partial via timely publishing windows and Elite-tier exclusivity on divergence alerts.

  5. Sport-specific edge cases. Some sports have lower sample sizes (e.g., golf majors are 4× year). Tier criteria are sport-specific to handle this.

We update this list as failure modes emerge.


Methodology version history

  • v1.0 (May 6, 2026): Initial public methodology. Bayesian Brier weighting, 5%-creator cap, forked-model deduplication.
  • v1.1 (May 7, 2026): Added Pack Breaker tournament submissions as a Consensus source type. Default Probationary weight; inherits creator's marketplace contribution tier if submitter has a Verified Sharp / Established marketplace model in the same sport.
  • v1.2 (planned, Q3 2026): Add per-sport sample-size adjustments. Add "Confidence Score" exposure.
  • v2.0 (planned, 2027): Real-time weighting updates (currently daily). Per-market specialization weighting.

Questions, suggestions, criticism

We welcome substantive feedback on the methodology:

If you find a flaw, we'd rather know now than discover it through bad outcomes later.