All posts

Self-calibratingcommodityeventcontracts:WTIfront-monthsettle

Across 43 Kalshi WTI ladder events, the implied median tracked the next-day NYMEX settle with a mean absolute error of $2.88 and a correlation of 0.85. Both the forecast and the ground truth come from a single venue's tables — no external commodity feed required.

BlogMay 21, 20264 min read
@Filippo Armani
Filippo ArmaniData Content Creator at Dune
Self-calibrating commodity event contracts: WTI front-month settle

How a Kalshi Strike Ladder Forecasts the WTI Front-Month Close

What we did

We tested how well Kalshi's WTI strike-ladder market forecasts the actual NYMEX front-month settle on the next trading day. Same underlying, two readings of the same dataset, no third-party reference price:

  • KXWTI strike ladder (Kalshi). Each trading day Kalshi lists 15 to 35 "Will WTI front-month settle above $X on [day]?" markets. The strike grid is roughly $1 wide around the prevailing spot, widening at the tails. Per-event mean: 20.3 strikes.
  • Realized close (also Kalshi). When each event resolves, the highest "Yes"-settled strike and the lowest "No"-settled strike bracket the actual NYMEX close. The bracket midpoint is the realized print, recoverable from kalshi.market_details.result alone.

From the trades in the 24-hour to 1-hour pre-settle window we VWAP'd the yes-price across the ladder, then inverted the survival curve: the implied median is the strike at which P(close > strike) linearly interpolates through 0.5.

Coverage of the analyzed window: 43 events over the last 60 trading days (Mar 9 to May 11 2026), 870 individual strike markets, with mean strikes per event of 20.3.

Explore Prediction Markets data

What we found

1. The implied median tracks the realized close. Across 43 events the correlation is 0.85 and the mean absolute error is $2.88 (median $2.32, p90 $6.88). 12 of 43 events (28%) landed within $1 of realized; 7 (16%) within 50 cents.

2. There is a small upward bias. Implied averages $98.85 vs realized $97.33, so the strike VWAP tends to price the close about $1.50 high. The bias is consistent with the geometry of the ladder: open-ended top buckets carry probability mass without a natural midpoint, which pulls the implied median up when the upper tail is fat.

3. Errors concentrate around volatile sessions. The maximum absolute error in the window was $11.16, on a session where WTI moved more between the T-1 VWAP window and the next-day settle than the strike grid could anticipate. P90 of $6.88 is roughly one to two strike-widths.

4. The dataset is self-contained. Both the forecast and the ground truth come from kalshi.market_details plus kalshi.market_trades. No external commodity price feed is required to score the forecast.

Important caveat on what this measures

The two series are not perfectly comparable, and a few methodological choices are worth surfacing:

  • Strike grid granularity. The implied median lives at a strike, not a continuous price. With ~$1 strike spacing the precision floor is ~$0.50. Errors below that are sub-resolution.
  • T-1 pre-settle window. We VWAP trades from 24 hours to 1 hour before resolution. Trades inside the last hour are excluded so the implied median is a forecast, not a fit. Late-session moves between the VWAP window and the actual close show up as error.
  • Bracket midpoint approximation. The realized close is the midpoint of the highest Yes-settled strike and the lowest No-settled strike. When the close lands exactly on a strike boundary, the midpoint is ~$0.50 from the print.
  • Open-ended tail buckets. Top and bottom buckets have no natural midpoint. Assigning the strike + $1 (top) or strike - $1 (bottom) is a choice that shifts the realized print when the close falls into the tail.
  • Event-driven sessions. Sessions with OPEC headlines, inventory prints, or geopolitical news produce large pre-VWAP-to-close drifts. Those sit in the right tail of the error distribution.

None of these break the directional signal. They define the regime in which the comparison is most informative: typical sessions with mid-grid closes, sub-$5 absolute error.

What this enables

The analysis required only two Kalshi tables on the unified schema (market_details for the strike grid and resolution, market_trades for the pre-settle VWAP). The new dataset makes this a single query; without it, the same analysis requires pulling per-event ladder definitions from Kalshi's API, reconstructing VWAP from raw fills, and joining to a separate commodity-futures feed for ground truth.

The same methodology extends to:

  • Other commodity series Kalshi lists as continuous strike ladders (Brent, Henry Hub gas, gasoline, copper). Any series with daily ladder events around a tradeable settle becomes self-calibrating.
  • Continuous forecast quality monitoring. Mean error, hit rates, and bias drift can be reported automatically as each new event resolves, with no external feed dependency.
  • Cross-venue comparison. Pairing Kalshi's strike-ladder implied median against a Polymarket equivalent (where one exists) or against NYMEX futures-implied probabilities surfaces basis at the daily level.
  • Methodology research. Window choice (T-1 vs T-2), strike-tail assignment, and bias correction all have measurable effects on the calibration. Researchers can sweep these without leaving the Kalshi tables.

Queries and data sources

Related

VIEW ALL

Make real impact with onchain data

JOIN US

Looking to use Dune for your company?