Statistical foundations · reference

METHODOLOGY

A concise reference for the math behind every indicator on hypo.markets — what each number means, how it's computed, and the canonical papers behind the estimators. Companion to the live per-asset dashboards and the interactive simulator.

Kelly criterion#

f★ = p − (1 − p) / b

The fraction of bankroll that maximises expected log-growth on a binary bet at decimal odds b with win probability p. Full Kelly extracts the most growth but with brutal variance; most pros run ¼ or ½ Kelly to survive model error. Negative f★ ⇒ the edge is on the other side and you should not take the position.

For a continuous-return asset (perp), the analogue is f★ = μ / σ² (Merton continuous-time). hypo.markets shows both the empirical argmax of g(f) and the parametric μ/σ² as side-by-side markers on the perp Kelly card.

Worked example
Given: price p = 0.40, model q = 0.55, so net odds b = (1−p)/p = 1.5
Step: f★ = q − (1−q)/b = 0.55 − 0.45/1.5 = 0.55 − 0.30
Result: f★ = 0.250 ⇒ deploy 25% of bankroll at full Kelly · ¼-Kelly = 6.25%

Ref: J. Kelly Jr., 1956 · Bell System Technical Journal

▶ See this live in the simulatorStrong edge · 40¢ market vs 55¢ model · ½-Kelly

Brier score + Murphy decomposition#

BRIER = REL − RES + UNC

The Brier score is the mean squared error of probabilistic forecasts: BS = mean((fᵢ − oᵢ)²). Lower is better; zero is perfect; the climatology baseline is ō·(1 − ō).

Murphy's decomposition splits BRIER into three meaningful parts: REL (reliability — how miscalibrated the model is; lower is better), RES (resolution — how decisively the model separates outcomes; higher is better), and UNC (uncertainty — the irreducible base-rate term).

The reliability diagram bins forecasts and plots the observed frequency against the stated probability. On-diagonal points mean perfectly calibrated. The sharpness histogram below the diagonal shows how often each forecast level is used.

Worked example
Given: 100 forecasts of 0.7 with 65 wins (ō = 0.65); base rate ō_all = 0.50
Step: REL = 1·(0.7 − 0.65)² = 0.0025 · RES = 1·(0.65 − 0.50)² = 0.0225 · UNC = 0.50·0.50 = 0.25
Result: BRIER = REL − RES + UNC = 0.0025 − 0.0225 + 0.25 = 0.230 · BSS = 1 − 0.230/0.25 = 8.0%

Ref: A. H. Murphy, 1973 · J. Applied Meteorology

▶ See this live in the simulatorSynthetic forecast ledger viewer (reseed to redraw)

ROC curve + AUC#

AUC = ∫₀¹ TPR d(FPR)

ROC plots true positive rate against false positive rate as the decision threshold sweeps from 1 to 0. AUC is the area underneath: 0.5 = coin-flip, 1.0 = perfect ranker. It measures ranking power — independent of calibration. A model with AUC = 0.9 but bad calibration can be rescued by Platt scaling; a model with AUC = 0.55 cannot.

▶ See this live in the simulatorROC + reliability + Murphy bars on a worked example

Bayesian Beta posterior#

θ ~ Beta(α, β), κ = α + β

Conjugate prior for a Bernoulli likelihood. We model your stated probability as the mean of a Beta posterior with concentration κ = q(1−q)/σ² − 1 driven by your stated 1σ uncertainty.

The 95% credible interval is the highest-density region containing 95% of the posterior mass — computed via cumulative trapezoidal integration over a 1000-point grid. If the market price falls outside this band, the disagreement is statistically meaningful.

Worked example
Given: model q = 0.60 with σ = 0.04
Step: κ = q(1−q)/σ² − 1 = 0.60·0.40 / 0.0016 − 1 = 150 − 1 = 149 · α = qκ = 89.4 · β = (1−q)κ = 59.6
Result: Beta(89.4, 59.6) · 95% CI ≈ [0.519, 0.677]. Market price 0.42 falls outside ⇒ meaningful disagreement.

▶ See this live in the simulatorTight posterior (σ=0.04) where market price falls outside the 95% CI

Binary entropy + KL divergence#

H(p) = −p·log₂ p − (1−p)·log₂(1−p)

Shannon's binary entropy peaks at 1 bit when p = 0.5 (maximal doubt) and falls to 0 at the boundaries (certainty).

KL divergence D_KL(q ‖ p) = q·ln(q/p) + (1−q)·ln((1−q)/(1−p)) measures the information your belief q adds beyond the market price p— the theoretical upper bound on exploitable edge. Zero KL ⇒ you know nothing the crowd doesn't. The branch decomposition splits KL into the YES contribution and the NO contribution.

Worked example
Given: model q = 0.70, market p = 0.50
Step: D_KL = 0.7·ln(0.7/0.5) + 0.3·ln(0.3/0.5) = 0.7·0.3365 + 0.3·(−0.5108)
Result: D_KL = 0.0853 nat (0.123 bit) — meaningful divergence ⇒ signal, not noise

▶ See this live in the simulatorBelief 70¢ vs market 50¢ — substantial KL (signal)

GARCH(1,1) conditional volatility#

σ²ₜ = ω + α·r²ₜ₋₁ + β·σ²ₜ₋₁

Volatility-clustering model. The conditional variance at time t is a weighted sum of yesterday's squared shock and yesterday's variance. Persistence α + β near 1 ⇒ shocks decay slowly; calm and turbulent regimes cluster. Position sizing must track the conditional σ, not the unconditional average.

Ref: T. Bollerslev, 1986 · J. Econometrics

ADF + KPSS unit-root tests#

The Augmented Dickey-Fuller (ADF) test has H₀: the series has a unit root (non-stationary, behaves like a random walk). Reject at the 5% level if the t-statistic is below ≈ −2.86.

The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test has the opposite null: H₀: the series is level-stationary. Reject at 5% if the statistic exceeds 0.463.

The cleanest verdict comes from running both: rejecting one and failing to reject the other gives an unambiguous answer. Rejecting both ⇒ the series sits in a grey zone.

Variance ratio test#

VR(q) = Var(rq) / (q · Var(r₁))

Under a random walk, VR(q) ≈ 1 for all q. VR > 1 means positive serial correlation (trending — long horizons are riskier than the IID assumption suggests); VR < 1 means mean reversion (long horizons are tamer). We use Lo-MacKinlay's asymptotic z-statistic for the significance test.

Ref: A. Lo + A. MacKinlay, 1988 · Review of Financial Studies

Hurst exponent (R/S analysis)#

log(R/S) = H · log(n) + c

Rescaled-range analysis estimates long-memory: H > 0.5 ⇒ persistent / trending, H < 0.5 ⇒ anti-persistent / mean-reverting, H ≈ 0.5 ⇒ memoryless random walk. Estimated via OLS on the log-log plot of R/S vs window size n.

Engle-Granger cointegration#

Two non-stationary series can still have a stationary linear combination — they are cointegrated. Step 1: OLS regression y = α + β·x + ε. Step 2: ADF on the residuals. Reject H₀ (no cointegration) at 5% if the residual ADF statistic is below ≈ −3.34.

Useful for pairs-trading: when the cointegration relationship spreads, you can short one leg and buy the other expecting reversion.

Ref: R. Engle + C. Granger, 1987 · Econometrica

VaR + Expected Shortfall (CVaR)#

VaRα is the α-quantile loss: on the worst (1−α)% of bars you lose at least VaRα. CVaRα (also called Expected Shortfall) is the average loss conditional on being in that tail. CVaR is always ≥ VaR and is the number that actually ruins accounts. Under normality, ES95 / VaR95≈ 1.25; ratios > 1.5 signal fat tails.

Worked example
Given: 100 bars of per-bet returns r ~ N(0, 0.05) at the 5th percentile
Step: VaR₉₅ = −Q(r, 0.05) = −(−0.0823) = 0.0823 (positive convention) · ES₉₅ = E[r | r ≤ −0.0823]
Result: VaR₉₅ ≈ 8.23%, ES₉₅ ≈ 10.3% · ES/VaR ≈ 1.25 ⇒ consistent with normality (no fat tail)

▶ See this live in the simulatorWide model σ + half-Kelly — VaR/CVaR widen in MC

Monte-Carlo equity simulation#

For binary positions we sample N parallel careers of K bets each, drawing the true probability from N(q, σ²) each bet, then resolving against the market price at deployed-Kelly leverage. For continuous returns (perps) we bootstrap with replacement from the observed return distribution — this preserves the fat tails the parametric Gaussian misses.

Outputs include the percentile fan (5/25/50/75/95), VaR95, CVaR95, maximum drawdown of the median path, terminal-wealth distribution, and the ruin rate (fraction of paths that ever crossed the 50% bankroll floor).

▶ See this live in the simulatorFull Kelly with realistic σ — observe ruin frequency