Skip to main content
Validraft

Methodology

Every control we run, in the open

Validraft maps to an institutional validation framework of 159 controls spanning the full research lifecycle — from economic rationale to governance. On every engagement we run the subset that fits your hypothesis, with no black-box score. This is what “institutional rigor” actually means, control by control.

0+
Controls applied
0
Control areas
0
Framework controls referenced

Not a checklist. A scoped validation profile.

A single un-optimized rule and a parameter-swept factor model need different scrutiny. So each control is required, conditional, diagnostic, or not applicable depending on the strategy family — multiple-testing corrections matter for a search, not for one fixed rule.

We pick the validation profile that matches your idea, then the report states exactly which controls ran, which passed, and which were not applicable — and why. Every result is descriptive, never prescriptive: research and simulation only, not investment advice.

Phase 0

Economic rationale & pre-registration

An edge needs a reason before it needs a backtest.

5 controls

  • Formalized economic hypothesis

    The idea is written down as a testable hypothesis — what the edge is, why it should exist, and what would falsify it — before any compute runs.

  • Edge classification

    The proposed edge is tagged by type (risk premium, behavioral, structural, information) so the validation profile matches the claim.

  • Ex-ante expected value

    A first-principles estimate of the edge net of expected costs, so a result can be compared against what was predicted upfront.

  • Publication bias & factor decay review

    For published or well-known anomalies, we assess decay since publication and crowding before trusting historical strength.

  • Causal robustness checklist

    A structured five-question check that the relationship is plausibly causal, not a coincidence mined from the data.

Phase 1

Data integrity & point-in-time

Most fake edges are really data bugs.

7 controls

  • Survivorship-bias correction

    Equity universes are reconstructed point-in-time, including delisted names, so the test is not run only on today's survivors.

  • Point-in-time validation

    Every input carries an as-of timestamp; data is joined as it was known at the time, with an explicit event-leakage gate.

  • Corporate-actions adjustment

    Splits, dividends, and IPOs are materialized and applied so prices and returns are continuous and correct across events.

  • Data-quality audit

    Inputs are screened for NaNs, duplicates, gaps, and outliers before anything runs — a hard gate, not a best-effort check.

  • Timezone & holiday alignment

    Per-market calendars align sessions and align cross-asset series, so signals never read across mismatched clocks.

  • Pipeline isolation

    Ingestion, backtest, and validation are separate stages with a snapshot manifest — no validation step can leak back into the data.

  • Investable-universe construction

    The tradable universe is built point-in-time from a universe master and symbol resolver, not hand-picked after the fact.

Phase 2 & 3

Signal & feature integrity

Where look-ahead and overfitting hide.

7 controls

  • Look-ahead bias detection

    Features are audited for any use of information not available at decision time, including macro/treasury series joined on their as-of date.

  • Feature leakage detection

    Inputs that smell like the answer (forward, target, label, outcome, pnl) are flagged before they can inflate a result.

  • Overfitting diagnosis

    Walk-forward, Deflated Sharpe, PBO, haircut Sharpe, and parameter sensitivity combine to expose curve-fitting rather than edge.

  • Stationarity tests (ADF / KPSS)

    Signal features are tested for stationarity, with cointegration checks (Engle–Granger + half-life) where a spread thesis applies.

  • Information Coefficient (IC / ICIR)

    For continuous signals, predictive power is measured with Pearson/Spearman IC, rolling ICIR, and decay — robust p-values included.

  • Feature stability & redundancy

    Rolling IC stability plus opt-in PCA diagnostics (explained variance, condition number, loading stability) catch unstable or redundant predictors.

  • Parameter sensitivity analysis

    Parameters are perturbed one at a time and re-tested, so a result that only works on one knife-edge setting is caught.

Phase 3.5

Execution & cost realism

A backtest with no costs is a fantasy.

6 controls

  • Bid-ask spread simulation

    Configurable spread is subtracted from PnL; tick-level engines use declared slippage so fills are not assumed at mid.

  • Commission & financing costs

    Per-share, per-contract, and basis-point fees are wired in; carry/financing is applied where overnight, leverage, or short exposure require it.

  • Market-impact modeling

    A square-root / coefficient impact proxy is applied where relevant, with an execution-realism gate that flags when a proxy isn't strong enough for the claim.

  • Turnover analysis

    Annualized turnover is reported so cost sensitivity and capacity can be judged against the strategy's trading intensity.

  • Capacity analysis

    A static capacity gate estimates how much size the strategy could absorb given volume and turnover before impact erodes the edge.

  • Short-borrow cost

    Borrow cost is charged on overnight/multi-day shorts; locate and hard-to-borrow constraints are surfaced where they affect tradability.

Phase 4

Statistical validation

The moat: is the edge real, or luck?

12 controls

  • Out-of-sample testing

    A frozen candidate is tested on data never used in selection, with explicit train/test windows.

  • Walk-forward analysis

    Rolling re-optimization and evaluation over time, so performance is judged out-of-sample at every step, not on one lucky split.

  • Combinatorial Purged CV (CPCV)

    Combinatorial folds with purging and embargo give a distribution of out-of-sample outcomes instead of a single path.

  • Deflated Sharpe Ratio (DSR)

    Sharpe is corrected for skew, kurtosis, and the number of trials tested — the headline number you can actually trust.

  • Probability of Backtest Overfitting (PBO)

    CSCV estimates the probability that the selected configuration is best in-sample but not out-of-sample.

  • Haircut Sharpe (Harvey–Liu)

    Sharpe is discounted for multiple testing across everything that was tried, so a single survivor isn't mistaken for skill.

  • Permutation / null-edge tests

    Returns are permuted to break the signal–outcome link; a real edge must beat the block-bootstrap and signal-target null distributions.

  • Walk-forward permutation test

    The walk-forward Sharpe is compared against a block-permuted null distribution, with the full null saved as an artifact.

  • Multiple-testing correction

    Bonferroni / Holm / Benjamini–Hochberg, plus Hansen SPA and Romano–Wolf step-down for post-selection inference on families of candidates.

  • Autocorrelation (Ljung–Box)

    Return autocorrelation is tested so a Sharpe inflated by serial correlation is identified and interpreted correctly.

  • Purging & embargo

    Overlapping labels and look-ahead around events are removed with purge and embargo windows, enforced down to the day slice.

  • Minimum backtest length (MinBTL)

    The observed window is checked against the minimum length needed to support the Sharpe claim, with a warning when it falls short.

Phase 4.5

Stress testing & regime analysis

How it behaves when markets break.

7 controls

  • Historical crisis scenarios

    Survival is measured through built-in crisis windows (2008, 2020, 2022, and more), with worst-case drawdown and return per scenario.

  • Regime-conditional analysis

    Performance is split across volatility, trend, liquidity, and extreme-event regimes, with an empirical transition matrix.

  • Regime contract (ex-ante vs ex-post)

    If a regime drives a filter, sizing, or selection, it must be defined ahead of time — circular, in-sample regime definitions are rejected.

  • Hidden Markov & change-point

    Opt-in HMM regime calibration and BOCPD change-point detection surface structural breaks in behavior.

  • Monte Carlo path risk

    Trade-sequence permutation, stationary bootstrap, and regime-switching resampling produce drawdown, terminal-return, and ruin distributions.

  • Perturbation & synthetic scenarios

    Realized returns are perturbed and stressed against synthetic macro scenarios to probe fragility beyond the realized path.

  • Liquidity stress

    A capacity/cost replay under widened spreads and volume caps tests whether the edge survives a thinner market.

Phase 5

Performance metrics

Beyond a single headline Sharpe.

5 controls

  • Risk-adjusted returns

    Sharpe, Sortino, and Calmar on net returns — not gross, not pre-cost.

  • Drawdown family

    Max drawdown with duration and time-to-recovery, plus Ulcer, Martin/MAR, Burke, and Pain index for the full pain profile.

  • Tail risk (CVaR / CDaR)

    Conditional value-at-risk and conditional drawdown-at-risk quantify the tail, not just the average.

  • Distribution shape

    Omega, gain-to-pain, profit factor, hit rate, and bias-corrected skew/kurtosis describe the actual return distribution.

  • Equity-curve stability

    Equity-curve R² measures how steadily the curve compounds versus how much it relies on a few outlier periods.

Phase 5.5

Benchmarking & attribution

Is it alpha, or just beta you could buy cheaply?

6 controls

  • Buy-and-hold benchmark

    Results are compared to a relevant buy-and-hold benchmark on total return — required, not optional, so excess return is explicit.

  • Alpha / beta decomposition

    OLS alpha, beta, R², and overlap window separate genuine excess return from market exposure.

  • Multi-factor attribution (FF5 + UMD)

    Returns are regressed on Fama–French five factors plus momentum, against a versioned factor snapshot, to see what really drives them.

  • Information ratio

    Active return over tracking error quantifies consistency of outperformance versus the benchmark.

  • Tail-risk correlation

    Correlation to the benchmark during its worst periods is checked, so a strategy isn't quietly long the same crash.

  • Market-neutral residual test

    For market-neutral and stat-arb claims, residual beta and R² are bounded to confirm the neutrality the strategy advertises.

Phase 6

Risk, sizing & portfolio

Surviving long enough for the edge to pay.

5 controls

  • Ruin & drawdown analysis

    Monte Carlo ruin probability by leverage and expected/percentile future max-drawdown frame the real risk of capital impairment.

  • Position sizing (Kelly / vol target)

    Fractional-Kelly budgeting and feature-based volatility targeting size positions with explicit leverage caps.

  • Risk-parity & constrained allocation

    Inverse-volatility / risk-parity weighting with min/max-weight constraints and a bounded simplex projection for multi-leg books.

  • Correlation & crowding (internal)

    Candidate correlation and marginal-Sharpe contribution are checked against the existing book to avoid redundant exposure.

  • Drawdown limits & kill-switch

    Max-drawdown and daily-loss limits are modeled so risk controls are part of the test, not an afterthought.

Phase 8–10

Governance, lineage & reproducibility

A result you can audit is a result you can trust.

6 controls

  • Pre-registration & sign-off

    The hypothesis is hashed and pre-registered before the run; a formal phase-gate sign-off records the GO / NO-GO decision.

  • Versioned data & code

    Every run records the git SHA, data hash, and engine version, so a result is tied to the exact inputs that produced it.

  • End-to-end data lineage

    A manifest plus feature-level lineage traces every number in the report back to its raw source.

  • Audit trail of decisions

    Scoping and validation decisions are captured in the run manifest — the reasoning is part of the deliverable, not lost in chat.

  • Model card

    An auto-generated model card documents the strategy, its assumptions, and its limitations alongside the results.

  • Reproducibility

    Immutable raw data, an immutable feature store, and a determinism check support independent re-running of the result.

Where we draw the line

What a validation is not

Rigor includes being honest about scope. An engagement is an offline validation of your hypothesis — these things are deliberately outside it.

  • Live order execution or order routing — research and simulation only.
  • Managed paper-trading or live-vs-backtest parity monitoring on your behalf.
  • Capital allocation, ramp-up, or any management of real money.
  • Investment advice — every report is descriptive, never prescriptive.

Put your idea through all of it

Describe your hypothesis in a brief. We scope the right validation profile and return the full report — the same controls, run by humans.