RESEARCH PAPER · V1.0

Pairs Trading Implementation. Hedge Ratio Estimation, Trading Rules and Backtesting.

Name: Hedgicore
Brand: Hedgicore
Availability: InStock

Six hedge-ratio estimators, the closed-form Bertram optimal-stopping system, the four conventional responses to the hedge-ratio update continuity problem, and the formal backtest-validation toolkit.

Bonton AI, Hedgicore Research·May 23, 2026·22 min read

ABSTRACT

This document is the companion to Statistical Arbitrage. A Technical Guide. It addresses three practical concerns of any pairs-trading implementation: hedge-ratio estimation, trading rule construction and backtest validation.

§2 reviews six hedge-ratio estimators with their optimisation criteria and properties. §3 derives trading rules from heuristic z-score thresholds through closed-form optimal stopping under the OU model and the stochastic-control generalisation, and discusses the hedge-ratio update continuity problem. §4 covers backtest validation: walk-forward partitioning, block bootstrap, the Probabilistic and Deflated Sharpe ratios, Probability of Backtest Overfitting, transaction-cost realism, survivorship and multiple-testing correction.

§2

Hedge ratio estimation

§2.1 The estimation problem

Let $\{X_t\}$ and $\{Y_t\}$ be two $I(1)$ price series. The pairs-trading implementation requires an estimate of $\beta$ in the cointegrating relationship

Y_t = \alpha + \beta X_t + Z_t, \quad Z_t \sim I(0).

The empirical spread is then $\hat{Z}_t = Y_t - \hat{\alpha} - \hat{\beta} X_t$ , and trading rules operate on $\hat{Z}_t$ .

The estimation choice is non-trivial. Estimators differ in symmetry (whether the result depends on which variable is on the left-hand side), sensitivity to noise in each series, robustness to outliers, time-stability under recursive or rolling estimation, and generalisability to $n > 2$ assets.

§2.2 Ordinary Least Squares (OLS)

The Engle-Granger (1987) two-step procedure uses OLS:

\hat{\beta}_{\mathrm{OLS}} = \frac{\widehat{\operatorname{Cov}}(X, Y)}{\widehat{\operatorname{Var}}(X)}

minimising $\sum_t (Y_t - \alpha - \beta X_t)^2$ . Asymmetric: $\hat{\beta}_{Y \mid X} \neq 1 / \hat{\beta}_{X \mid Y}$ in finite samples. The estimator assumes X is observed without error, which is generally violated in financial data. Consistent under cointegration but standard errors require the Phillips-Hansen (1990) correction because the cointegrating residual is autocorrelated.

§2.3 Total Least Squares (TLS) or Orthogonal Regression

TLS minimises the perpendicular distance from data points to the regression line:

\hat{\beta}_{\mathrm{TLS}} = \arg\min_{\beta, \alpha} \sum_t \frac{(Y_t - \alpha - \beta X_t)^2}{1 + \beta^2}

Symmetric in X and Y. The maximum-likelihood estimator under the errors-in-variables model where both observations are corrupted by iid Gaussian noise of equal variance. For series with unequal but known noise variances, weighted TLS is the appropriate generalisation. Also referred to as Deming regression in some literature.

§2.4 Johansen eigenvector

The Johansen (1988, 1991) maximum-likelihood procedure estimates the VECM and returns the cointegrating matrix $\hat{\beta}$ as eigenvectors of $\hat{\Pi}$ . For the $n = 2$ case, this is a $2 \times 2$ eigenvector problem; the cointegrating vector is normalised by convention. Symmetric, generalises to $n > 2$ and provides asymptotic test statistics for the number of cointegrating relationships. More sensitive to specification choices (lag length, treatment of deterministic terms) than the Engle-Granger approach.

§2.5 Box-Tiao canonical decomposition

Box and Tiao (1977) developed a procedure for decomposing a multivariate time series into components ranked by their predictability. Define a vector AR(1):

\mathbf{Y}_t = \Phi \mathbf{Y}_{t-1} + \boldsymbol{\varepsilon}_t, \quad \operatorname{Cov}(\boldsymbol{\varepsilon}_t) = \Sigma.

The most mean-reverting linear combination $\mathbf{w}^\top \mathbf{Y}_t$ is the eigenvector $\mathbf{w}$ of the matrix

M = \Sigma^{-1} \Phi \Sigma \Phi^\top

corresponding to its smallest eigenvalue. Asymmetric (selects the direction of maximum predictability), more robust to noise than OLS, generalises to $n > 2$ . Often used when mean-reversion speed is the property to optimise explicitly.

§2.6 Kalman filter

Elliott, van der Hoek and Malcolm (2005) propose representing the hedge ratio as a latent state variable in a Kalman filter (Kalman 1960):

\beta_t = \beta_{t-1} + \eta_t, \quad \eta_t \sim \mathcal{N}(0, Q)

Y_t = \beta_t X_t + \varepsilon_t, \quad \varepsilon_t \sim \mathcal{N}(0, R)

The Kalman recursion produces $\hat{\beta}_{t \mid t}$ , the minimum-mean-square estimator of $\beta_t$ given observations through time t:

\hat{\beta}_{t \mid t} = \hat{\beta}_{t-1 \mid t-1} + K_t (Y_t - \hat{\beta}_{t-1 \mid t-1} X_t)

where $K_t$ is the Kalman gain. The hyperparameter pair $(Q, R)$ jointly governs the responsiveness/smoothness tradeoff. Large Q produces a fast-adapting estimate; small Q produces a smooth, slowly-evolving estimate. Some implementations use EM (Dempster, Laird, Rubin 1977) to estimate $(Q, R)$ from data.

Continuous online estimation; no batch refit required; naturally handles time-varying $\beta$ . Computational cost is constant per observation. Principal limitation: opacity of the latent-state representation.

§2.7 Half-life minimisation

The half-life of an AR(1) spread is $\tau_{1/2} = -\ln 2 / \ln(1 + \phi)$ . Half-life minimisation chooses

\hat{\beta}_{\tau} = \arg\min_\beta \tau_{1/2}(Y - \beta X)

via numerical optimisation. Directly targets a trading-relevant property. Can be unstable when the AR(1) fit is poorly identified (e.g., when the spread is near a unit root). Computational cost is higher than OLS.

§2.8 ADF-statistic optimisation

Choose $\beta$ to minimise the ADF test statistic of the resulting spread:

\hat{\beta}_{\mathrm{ADF}} = \arg\min_\beta t_{\mathrm{ADF}}(Y - \beta X)

Directly targets statistical stationarity. Subject to the same identification concerns as half-life minimisation. The resulting estimate has non-standard sampling distribution.

§2.9 Comparative summary

Estimator	Symmetric?	Online?	n > 2?	Optimised for
OLS	No	Approx.	Pairwise	Squared Y-residuals
TLS	Yes	Approx.	Pairwise	Perpendicular residuals
Johansen	Yes	No (batch)	Yes	Cointegration likelihood
Box-Tiao	No	No (batch)	Yes	Predictability of combination
Kalman	Yes	Yes	Pairwise	Posterior under SSM
Half-life	No	No (batch)	Pairwise	Mean-reversion speed
ADF-stat	No	No (batch)	Pairwise	Stationarity test statistic

Online estimation in the OLS, TLS, half-life, ADF and Box-Tiao cases is achieved by periodic refitting on a rolling window; this introduces the practical question of refit cadence vs. responsiveness, with a continuity cost discussed in §3.5.

§3

Trading rule construction

§3.1 Z-score thresholds (heuristic)

The standard pairs-trading entry rule uses the rolling z-score:

z_t = \frac{Z_t - \hat{\mu}_W}{\hat{\sigma}_W}

with rolling window W. Enter long spread when $z_t \leq -\tau_{\text{entry}}$ , short spread when $z_t \geq +\tau_{\text{entry}}$ , exit when $z_t$ crosses $\pm \tau_{\text{exit}}$ . Standard heuristic values: $\tau_{\text{entry}} = 2$ , $\tau_{\text{exit}} = 0$ or 0.5. These are conventions inherited from Gatev, Goetzmann and Rouwenhorst (2006) and have no theoretical basis: the optimal level depends on the process parameters and on the transaction cost.

§3.2 Bertram optimal stopping (closed form under OU)

Bertram (2010) derives analytically the entry and exit z-score thresholds that maximise the long-run expected profit per unit time under the OU model $dZ_t = \theta(\mu - Z_t)\, dt + \sigma\, dW_t$ with transaction cost c per round trip.

Denote the dimensionless entry and exit thresholds as a and b with $a > b$ . The expected return per unit time is

E[r] = \frac{(a - b - c)}{\mathbb{E}[T(a, b)]}

where the expected round-trip time has a closed-form expression involving the Mills ratio and the imaginary error function:

\mathbb{E}[T(a, b)] = \frac{1}{\theta} \int_0^{(a-b)/\sqrt{2}} \mathrm{erfi}(u) \cdot \sqrt{2\pi} \, du

The first-order condition $\partial E[r] / \partial a = \partial E[r] / \partial b = 0$ yields the optimal entry and exit thresholds as a transcendental system solvable numerically. Bertram (2010, eq. 17 to 18) gives the explicit form. Zeng and Lee (2014) extend the framework to handle asymmetric transaction costs.

§3.3 Stochastic optimal control (continuous positioning)

Threshold strategies are discrete. The stochastic-control approach permits continuous positioning $\pi_t \in [-1, 1]$ as a function of the spread s distance from equilibrium.

Mudchanatongsuk, Primbs and Wong (2008) formulate the problem under CRRA utility. The Hamilton-Jacobi-Bellman equation yields the optimal $\pi_t^*$ as a smooth function of $(Z_t, t)$ . Closed-form solutions exist under specific utility functions and finite horizons. Jurek (2007) provides an analogous treatment that includes intertemporal consumption decisions.

§3.4 Half-life timeouts and risk-management overlays

Independent of the entry rule, two overlays are common:

Half-life-based timeout. Force-exit positions held longer than $k \cdot \tau_{1/2}$ bars with $k \in [2, 4]$ . Avoids capital trapped in positions where the assumed mean-reversion has not materialised.

Stop-loss and take-profit. Bar-level evaluation of fixed-percent SL and TP thresholds, evaluated on the position PnL independent of the spread z-score. Provides tail-risk protection at the cost of cutting trades that would have eventually reverted.

§3.5 The hedge-ratio-update continuity problem

Suppose $\hat{\beta}$ is updated at time $t_k$ from $\hat{\beta}_{k-1}$ to $\hat{\beta}_k$ . The empirical spread then exhibits a discontinuity at $t_k$ of magnitude $(\hat{\beta}_{k-1} - \hat{\beta}_k) X_{t_k}$ . The rolling z-score window straddling $t_k$ averages the pre- and post-update spread distributions and produces spurious entry signals during a transition period of length W.

Four conventional responses:

Continuous online re-estimation (e.g., Kalman or every-bar OLS). Eliminates discrete jumps but introduces continuous estimator jitter from noise in recent observations.
Static estimation (estimate once, hold). Eliminates all jumps but suffers from estimator staleness as the underlying relationship evolves.
Periodic re-estimation with no further adjustment. Accept the discontinuity. Z-score window length W should be much shorter than the refit interval to limit the contamination period.
Periodic re-estimation with spread adjustment. When $\hat{\beta}$ updates, adjust the spread series so that the displayed and statistically-evaluated $\hat{Z}_t$ is continuous across the update. The adjustment is by a constant offset chosen to match the value of the new spread to the value of the old spread at the update time.

Each response is a tradeoff among continuity, estimator currency and implementation complexity. The choice depends on the use case (visualisation, signal generation, risk reporting) and on the cadence of the trading horizon relative to the hedge-ratio drift timescale.

§4

Backtesting and validation

§4.1 Look-ahead bias

Any quantity used to generate a trading signal at time t must be a function only of data observable by time t. The cointegrating coefficient $\hat{\beta}$ estimated using data through time T may not be used to evaluate trades at $t < T$ . One-period lags between observation and execution are typical to reflect implementation realism.

§4.2 Walk-forward partitioning

A strategy with tunable parameters must use distinct training and test data partitions. The standard methodology (Bailey, Borwein, Lopez de Prado and Zhu 2014) is walk-forward partitioning. Define a training window of length $T_{\text{train}}$ and a test window of length $T_{\text{test}}$ with $T_{\text{train}} \gg T_{\text{test}}$ . Fit on training, evaluate on the immediately following test, slide, repeat. The aggregated test-window PnL is the walk-forward out-of-sample backtest.

§4.3 Resampling

Block bootstrap. Naive iid bootstrap is inappropriate for serially correlated financial data. Block bootstrap (Künsch 1989) resamples contiguous blocks of length $\ell$ to preserve local autocorrelation structure. Politis and Romano (1994) introduced the stationary bootstrap, which uses random block lengths drawn from a geometric distribution.

Monte Carlo simulation. Simulate alternative paths under an explicit data-generating process. For example, a GARCH model (Bollerslev 1986) combined with a calibrated OU model for the spread.

§4.4 Probabilistic, deflated and overfitting Sharpe diagnostics

Probabilistic Sharpe Ratio (PSR). Bailey and Lopez de Prado (2012) define PSR as the probability that the true Sharpe exceeds threshold $\mathrm{SR}^*$ , conditional on the observed Sharpe and the higher moments (skewness $\hat{\gamma}_3$ , kurtosis $\hat{\gamma}_4$ ) of the return distribution:

\mathrm{PSR}(\mathrm{SR}^*) = \Phi\left( \frac{(\widehat{\mathrm{SR}} - \mathrm{SR}^*) \sqrt{n - 1}}{\sqrt{1 - \hat{\gamma}_3 \widehat{\mathrm{SR}} + \frac{\hat{\gamma}_4 - 1}{4} \widehat{\mathrm{SR}}^2}} \right)

where n is the number of returns and $\Phi$ is the standard normal CDF.

Deflated Sharpe Ratio (DSR). Bailey and Lopez de Prado (2014) extend the PSR to account for selection bias when N strategies are tested:

\mathrm{SR}^*_{\mathrm{DSR}} = \sqrt{V[\mathrm{SR}]} \cdot \left( (1 - \gamma) \Phi^{-1}(1 - 1/N) + \gamma \Phi^{-1}(1 - 1/N \cdot e^{-1}) \right)

where $\gamma$ is the Euler-Mascheroni constant. DSR below 0.5 indicates the reported Sharpe is consistent with selection-bias noise.

Probability of Backtest Overfitting (PBO). Bailey, Borwein, Lopez de Prado and Zhu (2017) define PBO via combinatorially symmetric cross-validation. PBO $\geq 0.5$ indicates the backtest selection process has no out-of-sample skill.

§4.5 Transaction cost realism

Pairs trading is high-turnover. A defensible backtest models bid-ask spread (fills occur at the side of the book, not the midpoint), slippage (Almgren et al. 2005 for square-root or linear impact), exchange fees at the relevant tier, funding payments for perpetual futures at the venue cadence, borrow costs for the short leg in equity markets and latency between signal generation and exchange acknowledgement.

§4.6 Survivorship and selection biases

A pair universe constructed from currently-listed assets implicitly excludes assets that were delisted, merged or otherwise removed from the historical record. The resulting backtest overstates pair stability. Construct point-in-time universes from complete historical listing metadata.

The pairs evaluated must themselves be selected using only information available at the formation time. A backtest that uses 2024 cointegration tests to select pairs and then trades those pairs over 2018 to 2023 is selecting on the future.

§4.7 Multiple testing

A typical pair-selection process screens many candidate pairs against a cointegration test. Applying the standard ADF p-value threshold without a multiple-comparisons adjustment guarantees a meaningful fraction of selected pairs are false positives. Bonferroni and Holm-Bonferroni provide conservative corrections. The false-discovery-rate procedure of Benjamini and Hochberg (1995) is less conservative. For very large universes, the correction may render the surviving pair set empty.

§4.8 Reporting standards

A defensible pairs-trading backtest report should include:

Universe description with point-in-time membership criteria.
Method description sufficient to reproduce: pair selection, hedge-ratio estimator, trading rule, position sizing, exit conditions, transaction-cost model.
Walk-forward partitioning specification: window lengths, step size, parameter optimisation procedure per window.
Aggregate performance: walk-forward out-of-sample PnL, annualised Sharpe (and PSR / DSR), maximum drawdown, win rate per trade, turnover, transaction cost as fraction of gross PnL.
Robustness diagnostics: block-bootstrap or Monte Carlo PnL distribution; PBO if a parameter grid was searched; sensitivity to principal parameter choices.
Cost decomposition: PnL with and without each transaction-cost component.

Reports that omit walk-forward partitioning, that show a single equity curve without a resampled distribution, or that do not disclose the transaction-cost model are not interpretable as evidence of deployable performance.

REFERENCES

Cited works

Almgren, R., Thum, C., Hauptmann, E., Li, H. (2005). Equity market impact. Risk 18(7), 57 to 62.
Bailey, D.H., Borwein, J.M., Lopez de Prado, M., Zhu, Q.J. (2014). Pseudo-mathematics and financial charlatanism. Notices of the American Mathematical Society 61(5), 458 to 471.
Bailey, D.H., Borwein, J.M., Lopez de Prado, M., Zhu, Q.J. (2017). The probability of backtest overfitting. Journal of Computational Finance 20(4), 39 to 69.
Bailey, D.H., Lopez de Prado, M. (2012). The Sharpe ratio efficient frontier. Journal of Risk 15(2), 3 to 44.
Bailey, D.H., Lopez de Prado, M. (2014). The deflated Sharpe ratio. Journal of Portfolio Management 40(5), 94 to 107.
Benjamini, Y., Hochberg, Y. (1995). Controlling the false discovery rate. Journal of the Royal Statistical Society Series B 57(1), 289 to 300.
Bertram, W.K. (2010). Analytic solutions for optimal statistical arbitrage trading. Physica A 389(11), 2234 to 2243.
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31(3), 307 to 327.
Box, G.E.P., Tiao, G.C. (1977). A canonical analysis of multiple time series. Biometrika 64(2), 355 to 365.
Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1), 1 to 38.
Elliott, R.J., van der Hoek, J., Malcolm, W.P. (2005). Pairs trading. Quantitative Finance 5(3), 271 to 276.
Engle, R.F., Granger, C.W.J. (1987). Co-integration and error correction. Econometrica 55(2), 251 to 276.
Gatev, E., Goetzmann, W.N., Rouwenhorst, K.G. (2006). Pairs trading. Review of Financial Studies 19(3), 797 to 827.
Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12(2-3), 231 to 254.
Johansen, S. (1991). Estimation and hypothesis testing of cointegration vectors. Econometrica 59(6), 1551 to 1580.
Jurek, J.W. (2007). Optimal long-run reversal trading. Working paper, Princeton University.
Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering 82(1), 35 to 45.
Künsch, H.R. (1989). The jackknife and the bootstrap for general stationary observations. Annals of Statistics 17(3), 1217 to 1241.
Lopez de Prado, M. (2018). Advances in Financial Machine Learning. John Wiley & Sons.
Mudchanatongsuk, S., Primbs, J.A., Wong, W. (2008). Optimal pairs trading. A stochastic control approach. Proceedings of the 2008 American Control Conference, 1035 to 1039.
Phillips, P.C.B., Hansen, B.E. (1990). Statistical inference in instrumental variables regression with I(1) processes. Review of Economic Studies 57(1), 99 to 125.
Politis, D.N., Romano, J.P. (1994). The stationary bootstrap. Journal of the American Statistical Association 89(428), 1303 to 1313.
Zeng, Z., Lee, C.G. (2014). Pairs trading. Optimal thresholds and profitability. Quantitative Finance 14(11), 1881 to 1893.

Citation. Bonton AI, Hedgicore Research (2026). Pairs Trading Implementation. Hedge Ratio Estimation, Trading Rules and Backtesting.. v1.0. hedgicore.com/research

Hedgicore is a real-time pairs analytics platform powered by the Hedgicore Engine. Built by the team at Bonton AI.

Risk disclaimer: Hedgicore is an analytics platform. It does not execute trades or provide financial advice. All trading carries risk of loss.