Skip to content

Fierdimo/polymarket-analysis

Repository files navigation

Is There an Exploitable Edge in Polymarket 5-Minute Bitcoin Markets? A Rigorous Negative Result

A reproducible research study on short-horizon predictability, market efficiency, and transaction-cost structure in on-chain prediction markets. The headline finding is negative, and the methodology used to establish it honestly is the contribution.

Español: README.es.md


Abstract

We investigate whether a profitable trading edge exists in Polymarket's 5-minute "Bitcoin Up or Down" binary markets using only publicly available data. We build a 24/7 data-collection infrastructure (exchange trade flow, the market order book, and the Chainlink BTC/USD data stream that resolves the markets) and evaluate a sequence of hypotheses under a strict out-of-sample, cost-aware, anti-overfitting protocol. We show: (i) public price/technical features carry no directional signal at the 5-minute horizon; (ii) recorded trade-flow features yield only a weak, regime-dependent edge after a non-obvious data-quality correction; (iii) the market's true transaction cost (a 0.07·p·(1−p) taker fee plus a dynamic spread) imposes an irreducible edge barrier of ≈1.75 probability points at p=0.5, which the available signal does not clear; and (iv) out-of-sample, the order book itself is a better predictor of the outcome (AUC 0.837) than any predictor we construct from the resolution oracle or from a measured 2-second exchange→oracle lead (AUC 0.812). We conclude the market is efficient and not exploitable by a public-data taker, and we precisely characterise the only structurally favoured seat (market making). The value of this repository is the evaluation discipline: it documents how a tempting hypothesis was killed with evidence, including the detection and correction of three measurement errors that each would have produced a false conclusion.

Order-book efficiency and out-of-sample AUC comparison

Figure 1. Left: the Polymarket order-book mid is a calibrated estimate of the realised outcome probability (points track y = x), i.e. the market is efficient. Right: out-of-sample, the book (AUC 0.837) outpredicts every public-data predictor we built (0.812); a measured 2-second exchange→oracle lead adds nothing.


1. Introduction

Problem. Polymarket lists rolling binary markets "Will BTC be ≥ its price 5 minutes ago?". A market resolves Up iff price(end) ≥ price(start). The research question: with public data only, is there a strategy with positive expected value net of real transaction costs?

Objective. Estimate P(BTC(t+5m) ≥ BTC(t)) accurately enough to trade profitably, or determine rigorously that no such public-data edge exists.

Contribution. This is a negative result, reported honestly. The deliverable is not a profitable system (none was found and none is claimed); it is a reproducible, cost-aware, out-of-sample evaluation framework and the documented reasoning (including three self-caught measurement errors) that turns "it looked like it worked" into "it does not, and here is why."


2. Background: Market Mechanics and the Structural Wall

Understanding why the result is negative requires understanding the instrument. This section is prerequisite to the methodology.

2.1 The 5-minute block lifecycle

  • A new market opens every 5 minutes, 24/7 (288 windows/day per asset), with a deterministic slug {asset}-updown-5m-{unix} where unix is the 300-second-aligned window start.
  • At open the strike is fixed = the Chainlink BTC/USD value at that instant. It does not move for the rest of the block.
  • For the 5 minutes the CLOB order book is live: Up and Down shares trade in [$0, $1], tick $0.01, minimum order $5, observed liquidity $6k–$17k. A position can be exited before close by selling at the live price. You are not forced to hold to resolution.
  • At close, the Chainlink value at the end is compared to the strike: end ≥ strikeUp shares pay $1, Down pay $0 (ties resolve Up, matching the target P(BTC(t+5m) ≥ BTC(t))).
  • The payoff is binary and determined only by the two oracle readings (start, end). For a hold-to-resolution position the intermediate price path is irrelevant: being "ahead" with one minute left means nothing if the last minute reverses it.

2.2 Intra-block price dynamics

  • Near open the outcome is maximally uncertain: the mid is ≈ 0.50.
  • As time elapses and BTC moves relative to the (fixed) strike, the mid tracks the conditional probability P(Up | current gap, time remaining); as secs→0 it converges toward 0 or 1, effectively a step function.
  • Empirically the mid is well calibrated to the realised P(Up) across the full range (0.30→0.263, 0.50→0.492, 0.70→0.752, 0.95→0.975), and mid | gap>0 ≈ P(Up | gap>0) at every time bucket: the price already embeds the observable oracle gap.
  • The spread is dynamic: ≈1¢ when balanced / pre-open, widening to ≈5¢ mid-window, and up to ≈19¢ on BTC at informative moments, where market makers widen precisely when the outcome becomes contested, protecting against informed flow. Markets are tradeable ≈92% of the time.

2.3 Why the wall is structural, for the bot and the common user

  • The live price is a fast, efficient estimate of the resolution probability. Out-of-sample it is a better predictor (AUC 0.837) than any public-data predictor we built (0.812), including one exploiting a measured 2-second exchange→oracle lead (§5).
  • The market is negative-sum for takers: a taker-only fee 0.07·p·(1−p) (an irreducible ≈1.75 pp floor at p=0.5) plus the spread. The average taker must lose; winning takers are paid by losing takers, not by "the house".
  • "Enter late when it is nearly decided" fails: at ≈95% decided the book already prices Up at ≈$0.95 → pay $0.95 to win $1 at 95% ≈ break-even before cost, negative after. The number of participants is irrelevant. Efficiency is a property of the price; more participants make it more efficient, not less.
  • "Buy at 0.50, sell the peak" fails: it is the same prediction problem (the price moves with BTC faster than one predicts), it doubles the cost (two fee-bearing legs, with the fee maximised at p=0.5), and it adds a second unsolved prediction (the exit timing).
  • The 5-minute horizon is a pure speed/microstructure game, leaving no room for the research/judgment edge that is the only way a non-automated human wins in prediction markets generally. The common discretionary user is therefore strictly worse positioned than the automated approach (slower, unable to monitor 288 windows/day, paying the same fee) and is precisely the flow professional makers profit from. Short-run "wins" are variance, not skill; the long-run taker expectation is negative, the dynamics of a sharp bookmaker plus vig, not a beatable coin flip.
  • The only structurally favoured seat is the maker (fee-exempt, spread-capturing), i.e. professional market making under adverse-selection risk, requiring capital and infrastructure, not prediction, and not accessible to the common user.

In one line: the 5-minute block is an efficient, speed-dominated, negative-sum-for-takers binary game whose live price already contains the public information and whose fee is engineered so takers subsidise makers. The wall is not a tuning problem; it is the market's design.

2.4 Theoretical framing

  • EMH (Fama, 1970): a competitive price already incorporates public information; residual predictability for a public-data participant ≈ 0, exactly what §5 confirms empirically.
  • Microstructure (Easley, López de Prado & O'Hara, VPIN, 2012; Harris, Trading and Exchanges, 2003): order-flow imbalance, aggressor pressure and large-trade ("whale") activity are the candidate leading signals when lagging price features fail (tested in §5.2).
  • Binary payoff: PnL is not proportional to the underlying's return; mishandling this silently invalidates a backtest (§6, Error 1).

3. Data

Stream Source Cadence Volume collected
Spot/perp trades Bybit WebSocket (BTC/ETH/SOL perp) tick ~11 days, ~8M trades/symbol
Resolution oracle Chainlink BTC/USD via Polymarket public WS 1 Hz (~1.3 s delivery latency) 410 windows (~34 h)
Order book Polymarket CLOB (btc-updown-5m-*) ~1 s poll time-aligned with oracle
Cost/liquidity probe Polymarket Gamma + CLOB 40 s 8,915 samples (tradeable 91.8%)

Collection runs as resilient launchd daemons (auto-restart, network-state aware, periodic flush). Market discovery is deterministic from the 5-minute-aligned slug; the repository's original keyword-based finder was found defective (§6, Error 2).


4. Methodology

Evaluation protocol (applied to every hypothesis):

  1. Out-of-sample: temporal train/test split; nuisance parameters estimated on train only, evaluated on held-out test.
  2. Cost-aware: PnL computed with the correct binary payoff and the real fee, taker filled at the recorded ask (simulation/cost_model.py).
  3. Anti-overfitting: predictors are pre-specified and simple; results are reported per time-bucket (no "best bucket" cherry-picking); regime dependence is reported, not hidden.
  4. Robust core metric: out-of-sample log-loss / AUC vs. outcome, an execution-assumption-free measure. If a PnL number contradicts the robust metric, the robust metric is trusted and the PnL is treated as an artifact.

Transaction-cost model (decoded from primary sources, Polymarket fee documentation): crypto markets charge a taker-only fee fee = C · 0.07 · p · (1−p) (C = shares, p = price); makers pay zero. Verified against the published peak of $1.75 per 100 shares at p=0.5. The implied break-even edge over the mid is ≈ spread/2 + a·0.07·(1−a), i.e. an irreducible ≈1.75 probability-point floor at p≈0.5, before spread.


5. Experiments and Results

5.1 Public price/technical features (control)

LGBM / TCN / latent-SDE on OHLCV + technical indicators + funding. Result: validation BCE ≈ 0.71 at k=0, worse than the random baseline (0.693). No directional signal. Architecture changes do not help.

5.2 Recorded trade-flow features

Aggressor imbalance, buy pressure, large-trade ratio (whale proxy), VWAP deviation, 5-minute bins. Result, raw: "no signal." After correcting a data-quality defect (§6, Error 1, partial bins at recorder-gap edges): Δ AUC = +0.024 (t ≈ 3.0, 15/25 paired wins), AUC ≈ 0.534. A real but weak edge; regime-dependent (one CV fold negative); BCE only marginally below random. First positive result of the project, and far too weak to clear §4's cost floor.

5.3 Resolution-oracle observation

The markets resolve on the Chainlink BTC/USD data stream, not on Bybit (§6, Error 3). The oracle is observable in real time (public WS, 1 Hz). With 407 recorded windows:

  • Order-book calibration: book mid ≈ realised P(Up) across the entire range; the book is highly efficient.
  • Naive oracle-vs-book taker: −0.19 PnL per $1 over 77,656 trades, negative in every time bucket.
  • Lead–lag: Bybit leads the Chainlink tick by ≈2 s (corr ret_Bybit(t) vs ret_Chainlink(t+2s) = +0.71; contemporaneous +0.04), a clean structural lead.

5.4 The decisive money-test (out-of-sample)

Predictors compared on a held-out second half (205 windows): A = oracle gap only; B = oracle gap corrected with the real-time Bybit lead (basis handled by differencing, not levels).

Predictor OOS log-loss OOS AUC
Order-book mid 0.491 0.837
A: Chainlink only 0.610 0.812
B: Bybit-corrected 0.610 0.812

The book predicts the outcome better than we do, and the 2-second lead adds zero incremental predictive power (B ≈ A to four decimals). A small positive taker PnL appeared in the simulation but contradicted the robust metric and was internally inconsistent (negative in the tightest buckets; absurd "maker" figures), so it was classified as an execution-model artifact and not reported as a result (§4.4).


6. Self-caught measurement errors (the methodological core)

Each of these, uncorrected, would have produced a false conclusion. Finding and reporting them is the substantive contribution.

  1. Dirty-data masking. Recorder network gaps left partial 5-minute bins whose flow features were computed over a fraction of the interval. They were not NaN, so they silently entered the model and reversed the trade-flow verdict (from "no signal" to "weak signal" once cleaned at source). Lesson: clean at construction, not downstream.
  2. Wrong instrument / broken plumbing. The simulation PnL modelled a linear return, not the binary payoff; the order-book discovery used a keyword search that returned unrelated markets. Both invalidate naive backtests until fixed.
  3. Wrong resolution source. All early signal work used Bybit as the label; the markets actually resolve on the Chainlink BTC/USD stream. The label was wrong until corrected (raised by domain questioning, then verified against the market's authoritative resolution rule).

7. Discussion

There are two independent walls (mechanically argued in §2, empirically confirmed here):

  • Predictability wall. Short-horizon BTC direction is not predictable from public data better than the market itself. Out-of-sample the order book (AUC 0.837) dominates every predictor we built (AUC 0.812), including one exploiting a measured exchange→oracle latency lead.
  • Cost/structure wall. The taker-only 0.07·p·(1−p) fee plus a dynamic 1–19¢ spread create a break-even barrier (~2–4 pp) far above the available signal. The only structurally favoured seat is the maker (fee-exempt, spread-capturing), i.e. professional market making with adverse-selection risk, a different problem from prediction, not claimed solved here.

The market is efficient because faster, better-informed participants are already extracting and thereby eliminating the edge. Efficiency is the evidence that the winning seat exists and is occupied.


8. Limitations

  • Resolution-oracle dataset is short (~34 h / 407 windows); trade-flow ~11 d.
  • Single venue; down-token book approximated as 1 − up.
  • The maker seat is not rigorously tested (only a deliberately optimistic, artifact-level approximation, explicitly discarded).
  • Residual-volatility scaling for the conditional probability is an empirical approximation; the robust AUC/log-loss core does not depend on it.
  • No claim of profitability is made; this is a negative result by design.

9. Conclusion

With public data, as a taker, the Polymarket 5-minute BTC market is not exploitable: the order book is more informed than our best predictor and the fee/spread structure exceeds the residual edge. The result holds for the automated approach and, a fortiori, for the common discretionary user, who is strictly worse positioned (§2.3). The investigation is reported as a negative result with full reasoning. The transferable output is the evaluation framework (out-of-sample, cost-aware, anti-overfitting, and honest about self-inflicted measurement error) together with a reusable real-time market/oracle data infrastructure.


10. Reproducibility

data/                 collectors + raw/aggregated parquet (trades, resolution, history)
features/             technical.py, fractal.py  (feature construction)
simulation/           cost_model.py  (correct binary payoff + real fee + breakeven)
polymarket/           client/finder/executor/resolver  (venue interface)
scripts/
  record_trades.py            Bybit trade-flow recorder (daemon)
  record_resolution.py        Chainlink oracle + Polymarket book recorder (daemon)
  polymarket_cost_probe.py    spread/executability probe (daemon)
  build_trade_features.py     raw trades -> cleaned 5m features
  test_trade_signal.py        trade-flow signal test (OOS, multi-seed)
  analyze_resolution.py       oracle-vs-book efficiency / edge analysis
  analyze_leadlag.py          Bybit->Chainlink lead-lag
  analyze_money_test.py       decisive OOS money-test (predictor A vs B vs book)

Run order: ingest.py -> build_trade_features.py -> test_trade_signal.py; data daemons feed analyze_*.py. Self-check the cost model with python -m simulation.cost_model (verifies the $1.75/100-share fee peak).


References

  1. E. F. Fama. Efficient Capital Markets: A Review of Theory and Empirical Work. Journal of Finance, 1970.
  2. D. Easley, M. López de Prado, M. O'Hara. Flow Toxicity and Liquidity in a High-Frequency World (VPIN). Review of Financial Studies, 2012.
  3. L. Harris. Trading and Exchanges: Market Microstructure for Practitioners. Oxford University Press, 2003.
  4. Polymarket. Trading Fees. https://docs.polymarket.com/trading/fees
  5. Chainlink. Data Streams. https://docs.chain.link/data-streams

Author's note: this study deliberately reports a negative result. In quantitative research, a rigorously established and honestly documented "this does not work, and here is precisely why" is a stronger signal of methodology than an unverified claim that something does.

About

A reproducible research study on short-horizon predictability, market efficiency, and transaction-cost structure in on-chain prediction markets. The headline finding is negative, and the methodology used to establish it honestly is the contribution

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages