RESEARCH PAPER · V1.0
Pairs Trading Implementation. Hedge Ratio Estimation, Trading Rules and Backtesting.
Six hedge-ratio estimators, the closed-form Bertram optimal-stopping system, the four conventional responses to the hedge-ratio update continuity problem, and the formal backtest-validation toolkit.
ABSTRACT
This document is the companion to Statistical Arbitrage. A Technical Guide. It addresses three practical concerns of any pairs-trading implementation: hedge-ratio estimation, trading rule construction and backtest validation.
§2 reviews six hedge-ratio estimators with their optimisation criteria and properties. §3 derives trading rules from heuristic z-score thresholds through closed-form optimal stopping under the OU model and the stochastic-control generalisation, and discusses the hedge-ratio update continuity problem. §4 covers backtest validation: walk-forward partitioning, block bootstrap, the Probabilistic and Deflated Sharpe ratios, Probability of Backtest Overfitting, transaction-cost realism, survivorship and multiple-testing correction.
§2
Hedge ratio estimation
§2.1 The estimation problem
Let and be two price series. The pairs-trading implementation requires an estimate of in the cointegrating relationship
The empirical spread is then , and trading rules operate on .
The estimation choice is non-trivial. Estimators differ in symmetry (whether the result depends on which variable is on the left-hand side), sensitivity to noise in each series, robustness to outliers, time-stability under recursive or rolling estimation, and generalisability to assets.
§2.2 Ordinary Least Squares (OLS)
The Engle-Granger (1987) two-step procedure uses OLS:
minimising . Asymmetric: in finite samples. The estimator assumes X is observed without error, which is generally violated in financial data. Consistent under cointegration but standard errors require the Phillips-Hansen (1990) correction because the cointegrating residual is autocorrelated.
§2.3 Total Least Squares (TLS) or Orthogonal Regression
TLS minimises the perpendicular distance from data points to the regression line:
Symmetric in X and Y. The maximum-likelihood estimator under the errors-in-variables model where both observations are corrupted by iid Gaussian noise of equal variance. For series with unequal but known noise variances, weighted TLS is the appropriate generalisation. Also referred to as Deming regression in some literature.
§2.4 Johansen eigenvector
The Johansen (1988, 1991) maximum-likelihood procedure estimates the VECM and returns the cointegrating matrix as eigenvectors of . For the case, this is a eigenvector problem; the cointegrating vector is normalised by convention. Symmetric, generalises to and provides asymptotic test statistics for the number of cointegrating relationships. More sensitive to specification choices (lag length, treatment of deterministic terms) than the Engle-Granger approach.
§2.5 Box-Tiao canonical decomposition
Box and Tiao (1977) developed a procedure for decomposing a multivariate time series into components ranked by their predictability. Define a vector AR(1):
The most mean-reverting linear combination is the eigenvector of the matrix
corresponding to its smallest eigenvalue. Asymmetric (selects the direction of maximum predictability), more robust to noise than OLS, generalises to . Often used when mean-reversion speed is the property to optimise explicitly.
§2.6 Kalman filter
Elliott, van der Hoek and Malcolm (2005) propose representing the hedge ratio as a latent state variable in a Kalman filter (Kalman 1960):
The Kalman recursion produces , the minimum-mean-square estimator of given observations through time t:
where is the Kalman gain. The hyperparameter pair jointly governs the responsiveness/smoothness tradeoff. Large Q produces a fast-adapting estimate; small Q produces a smooth, slowly-evolving estimate. Some implementations use EM (Dempster, Laird, Rubin 1977) to estimate from data.
Continuous online estimation; no batch refit required; naturally handles time-varying . Computational cost is constant per observation. Principal limitation: opacity of the latent-state representation.
§2.7 Half-life minimisation
The half-life of an AR(1) spread is . Half-life minimisation chooses
via numerical optimisation. Directly targets a trading-relevant property. Can be unstable when the AR(1) fit is poorly identified (e.g., when the spread is near a unit root). Computational cost is higher than OLS.
§2.8 ADF-statistic optimisation
Choose to minimise the ADF test statistic of the resulting spread:
Directly targets statistical stationarity. Subject to the same identification concerns as half-life minimisation. The resulting estimate has non-standard sampling distribution.
§2.9 Comparative summary
| Estimator | Symmetric? | Online? | n > 2? | Optimised for |
|---|---|---|---|---|
| OLS | No | Approx. | Pairwise | Squared Y-residuals |
| TLS | Yes | Approx. | Pairwise | Perpendicular residuals |
| Johansen | Yes | No (batch) | Yes | Cointegration likelihood |
| Box-Tiao | No | No (batch) | Yes | Predictability of combination |
| Kalman | Yes | Yes | Pairwise | Posterior under SSM |
| Half-life | No | No (batch) | Pairwise | Mean-reversion speed |
| ADF-stat | No | No (batch) | Pairwise | Stationarity test statistic |
Online estimation in the OLS, TLS, half-life, ADF and Box-Tiao cases is achieved by periodic refitting on a rolling window; this introduces the practical question of refit cadence vs. responsiveness, with a continuity cost discussed in §3.5.
§3
Trading rule construction
§3.1 Z-score thresholds (heuristic)
The standard pairs-trading entry rule uses the rolling z-score:
with rolling window W. Enter long spread when , short spread when , exit when crosses . Standard heuristic values: , or 0.5. These are conventions inherited from Gatev, Goetzmann and Rouwenhorst (2006) and have no theoretical basis: the optimal level depends on the process parameters and on the transaction cost.
§3.2 Bertram optimal stopping (closed form under OU)
Bertram (2010) derives analytically the entry and exit z-score thresholds that maximise the long-run expected profit per unit time under the OU model with transaction cost c per round trip.
Denote the dimensionless entry and exit thresholds as a and b with . The expected return per unit time is
where the expected round-trip time has a closed-form expression involving the Mills ratio and the imaginary error function:
The first-order condition yields the optimal entry and exit thresholds as a transcendental system solvable numerically. Bertram (2010, eq. 17 to 18) gives the explicit form. Zeng and Lee (2014) extend the framework to handle asymmetric transaction costs.
§3.3 Stochastic optimal control (continuous positioning)
Threshold strategies are discrete. The stochastic-control approach permits continuous positioning as a function of the spread s distance from equilibrium.
Mudchanatongsuk, Primbs and Wong (2008) formulate the problem under CRRA utility. The Hamilton-Jacobi-Bellman equation yields the optimal as a smooth function of . Closed-form solutions exist under specific utility functions and finite horizons. Jurek (2007) provides an analogous treatment that includes intertemporal consumption decisions.
§3.4 Half-life timeouts and risk-management overlays
Independent of the entry rule, two overlays are common:
Half-life-based timeout. Force-exit positions held longer than bars with . Avoids capital trapped in positions where the assumed mean-reversion has not materialised.
Stop-loss and take-profit. Bar-level evaluation of fixed-percent SL and TP thresholds, evaluated on the position PnL independent of the spread z-score. Provides tail-risk protection at the cost of cutting trades that would have eventually reverted.
§3.5 The hedge-ratio-update continuity problem
Suppose is updated at time from to . The empirical spread then exhibits a discontinuity at of magnitude . The rolling z-score window straddling averages the pre- and post-update spread distributions and produces spurious entry signals during a transition period of length W.
Four conventional responses:
- Continuous online re-estimation (e.g., Kalman or every-bar OLS). Eliminates discrete jumps but introduces continuous estimator jitter from noise in recent observations.
- Static estimation (estimate once, hold). Eliminates all jumps but suffers from estimator staleness as the underlying relationship evolves.
- Periodic re-estimation with no further adjustment. Accept the discontinuity. Z-score window length W should be much shorter than the refit interval to limit the contamination period.
- Periodic re-estimation with spread adjustment. When updates, adjust the spread series so that the displayed and statistically-evaluated is continuous across the update. The adjustment is by a constant offset chosen to match the value of the new spread to the value of the old spread at the update time.
Each response is a tradeoff among continuity, estimator currency and implementation complexity. The choice depends on the use case (visualisation, signal generation, risk reporting) and on the cadence of the trading horizon relative to the hedge-ratio drift timescale.
§4
Backtesting and validation
§4.1 Look-ahead bias
Any quantity used to generate a trading signal at time t must be a function only of data observable by time t. The cointegrating coefficient estimated using data through time T may not be used to evaluate trades at . One-period lags between observation and execution are typical to reflect implementation realism.
§4.2 Walk-forward partitioning
A strategy with tunable parameters must use distinct training and test data partitions. The standard methodology (Bailey, Borwein, Lopez de Prado and Zhu 2014) is walk-forward partitioning. Define a training window of length and a test window of length with . Fit on training, evaluate on the immediately following test, slide, repeat. The aggregated test-window PnL is the walk-forward out-of-sample backtest.
§4.3 Resampling
Block bootstrap. Naive iid bootstrap is inappropriate for serially correlated financial data. Block bootstrap (Künsch 1989) resamples contiguous blocks of length to preserve local autocorrelation structure. Politis and Romano (1994) introduced the stationary bootstrap, which uses random block lengths drawn from a geometric distribution.
Monte Carlo simulation. Simulate alternative paths under an explicit data-generating process. For example, a GARCH model (Bollerslev 1986) combined with a calibrated OU model for the spread.
§4.4 Probabilistic, deflated and overfitting Sharpe diagnostics
Probabilistic Sharpe Ratio (PSR). Bailey and Lopez de Prado (2012) define PSR as the probability that the true Sharpe exceeds threshold , conditional on the observed Sharpe and the higher moments (skewness , kurtosis ) of the return distribution:
where n is the number of returns and is the standard normal CDF.
Deflated Sharpe Ratio (DSR). Bailey and Lopez de Prado (2014) extend the PSR to account for selection bias when N strategies are tested:
where is the Euler-Mascheroni constant. DSR below 0.5 indicates the reported Sharpe is consistent with selection-bias noise.
Probability of Backtest Overfitting (PBO). Bailey, Borwein, Lopez de Prado and Zhu (2017) define PBO via combinatorially symmetric cross-validation. PBO indicates the backtest selection process has no out-of-sample skill.
§4.5 Transaction cost realism
Pairs trading is high-turnover. A defensible backtest models bid-ask spread (fills occur at the side of the book, not the midpoint), slippage (Almgren et al. 2005 for square-root or linear impact), exchange fees at the relevant tier, funding payments for perpetual futures at the venue cadence, borrow costs for the short leg in equity markets and latency between signal generation and exchange acknowledgement.
§4.6 Survivorship and selection biases
A pair universe constructed from currently-listed assets implicitly excludes assets that were delisted, merged or otherwise removed from the historical record. The resulting backtest overstates pair stability. Construct point-in-time universes from complete historical listing metadata.
The pairs evaluated must themselves be selected using only information available at the formation time. A backtest that uses 2024 cointegration tests to select pairs and then trades those pairs over 2018 to 2023 is selecting on the future.
§4.7 Multiple testing
A typical pair-selection process screens many candidate pairs against a cointegration test. Applying the standard ADF p-value threshold without a multiple-comparisons adjustment guarantees a meaningful fraction of selected pairs are false positives. Bonferroni and Holm-Bonferroni provide conservative corrections. The false-discovery-rate procedure of Benjamini and Hochberg (1995) is less conservative. For very large universes, the correction may render the surviving pair set empty.
§4.8 Reporting standards
A defensible pairs-trading backtest report should include:
- Universe description with point-in-time membership criteria.
- Method description sufficient to reproduce: pair selection, hedge-ratio estimator, trading rule, position sizing, exit conditions, transaction-cost model.
- Walk-forward partitioning specification: window lengths, step size, parameter optimisation procedure per window.
- Aggregate performance: walk-forward out-of-sample PnL, annualised Sharpe (and PSR / DSR), maximum drawdown, win rate per trade, turnover, transaction cost as fraction of gross PnL.
- Robustness diagnostics: block-bootstrap or Monte Carlo PnL distribution; PBO if a parameter grid was searched; sensitivity to principal parameter choices.
- Cost decomposition: PnL with and without each transaction-cost component.
Reports that omit walk-forward partitioning, that show a single equity curve without a resampled distribution, or that do not disclose the transaction-cost model are not interpretable as evidence of deployable performance.
REFERENCES
Cited works
- Almgren, R., Thum, C., Hauptmann, E., Li, H. (2005). Equity market impact. Risk 18(7), 57 to 62.
- Bailey, D.H., Borwein, J.M., Lopez de Prado, M., Zhu, Q.J. (2014). Pseudo-mathematics and financial charlatanism. Notices of the American Mathematical Society 61(5), 458 to 471.
- Bailey, D.H., Borwein, J.M., Lopez de Prado, M., Zhu, Q.J. (2017). The probability of backtest overfitting. Journal of Computational Finance 20(4), 39 to 69.
- Bailey, D.H., Lopez de Prado, M. (2012). The Sharpe ratio efficient frontier. Journal of Risk 15(2), 3 to 44.
- Bailey, D.H., Lopez de Prado, M. (2014). The deflated Sharpe ratio. Journal of Portfolio Management 40(5), 94 to 107.
- Benjamini, Y., Hochberg, Y. (1995). Controlling the false discovery rate. Journal of the Royal Statistical Society Series B 57(1), 289 to 300.
- Bertram, W.K. (2010). Analytic solutions for optimal statistical arbitrage trading. Physica A 389(11), 2234 to 2243.
- Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31(3), 307 to 327.
- Box, G.E.P., Tiao, G.C. (1977). A canonical analysis of multiple time series. Biometrika 64(2), 355 to 365.
- Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1), 1 to 38.
- Elliott, R.J., van der Hoek, J., Malcolm, W.P. (2005). Pairs trading. Quantitative Finance 5(3), 271 to 276.
- Engle, R.F., Granger, C.W.J. (1987). Co-integration and error correction. Econometrica 55(2), 251 to 276.
- Gatev, E., Goetzmann, W.N., Rouwenhorst, K.G. (2006). Pairs trading. Review of Financial Studies 19(3), 797 to 827.
- Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12(2-3), 231 to 254.
- Johansen, S. (1991). Estimation and hypothesis testing of cointegration vectors. Econometrica 59(6), 1551 to 1580.
- Jurek, J.W. (2007). Optimal long-run reversal trading. Working paper, Princeton University.
- Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering 82(1), 35 to 45.
- Künsch, H.R. (1989). The jackknife and the bootstrap for general stationary observations. Annals of Statistics 17(3), 1217 to 1241.
- Lopez de Prado, M. (2018). Advances in Financial Machine Learning. John Wiley & Sons.
- Mudchanatongsuk, S., Primbs, J.A., Wong, W. (2008). Optimal pairs trading. A stochastic control approach. Proceedings of the 2008 American Control Conference, 1035 to 1039.
- Phillips, P.C.B., Hansen, B.E. (1990). Statistical inference in instrumental variables regression with I(1) processes. Review of Economic Studies 57(1), 99 to 125.
- Politis, D.N., Romano, J.P. (1994). The stationary bootstrap. Journal of the American Statistical Association 89(428), 1303 to 1313.
- Zeng, Z., Lee, C.G. (2014). Pairs trading. Optimal thresholds and profitability. Quantitative Finance 14(11), 1881 to 1893.
Citation. Bonton AI, Hedgicore Research (2026). Pairs Trading Implementation. Hedge Ratio Estimation, Trading Rules and Backtesting.. v1.0. hedgicore.com/research
Hedgicore is a real-time pairs analytics platform powered by the Hedgicore Engine. Built by the team at Bonton AI.
Risk disclaimer: Hedgicore is an analytics platform. It does not execute trades or provide financial advice. All trading carries risk of loss.
