RESEARCH PAPER · V1.0
Statistical Arbitrage. A Technical Guide.
Stationarity, cointegration, VECM, OU processes, the five method families and crypto-specific substrate considerations.
ABSTRACT
Statistical arbitrage refers to a class of quantitative trading strategies that exploit relative mispricings between financial instruments by means of statistical models of their joint dynamics. Unlike classical arbitrage, which exploits riskless, deterministic price discrepancies, statistical arbitrage involves market risk and is conditional on the continued validity of a statistical relationship that may break down out of sample.
This document is a technical guide to the statistical and methodological foundations of statistical arbitrage. §2 develops the mathematical foundation. §3 surveys the five academic method families. §4 discusses estimation of the cointegrating coefficient. §5 discusses backtesting and validation. §6 covers multi-asset extensions. §7 discusses considerations specific to cryptocurrency perpetual futures.
§1
Introduction
The strategy was developed at Morgan Stanley in the mid-to-late 1980s by a quantitative group led by Nunzio Tartaglia, whose alumni went on to found or join PDT Partners, D.E. Shaw and Two Sigma. The earliest published academic treatment is Gatev, Goetzmann and Rouwenhorst (2006), who document a positive-Sharpe distance-based strategy on US equities over 1962 to 2002. The mathematical foundation, cointegration, was formalised in Granger (1981) and Engle and Granger (1987), the latter contributing the two-step cointegration estimation procedure that underlies most modern implementations. Pole (2007), Vidyamurthy (2004) and Krauss (2017) are the canonical practitioner and review references.
A companion document covers practical implementation details, trading rules, position sizing and transaction-cost modelling in greater depth: Pairs Trading Implementation.
§2
Mathematical foundations
§2.1 Stationarity
A stochastic process is weak-sense stationary (or covariance-stationary) if its first two moments are time-invariant:
For statistical arbitrage purposes, weak-sense stationarity is sufficient: the process has a well-defined long-run mean to which it tends to return. Strict stationarity, invariance of the full joint distribution under time shifts, is rarely required and rarely tested.
Tests for stationarity in finite samples include the Augmented Dickey-Fuller test (Dickey and Fuller 1979; Said and Dickey 1984), the Phillips-Perron test (Phillips and Perron 1988) and the KPSS test (Kwiatkowski, Phillips, Schmidt and Shin 1992). The first two test the null of a unit root (non-stationarity); the third tests the null of stationarity. Pairs-trading implementations typically employ ADF as the primary qualifying test.
§2.2 Order of integration
A process is integrated of order d, written , if its d-th difference is stationary and no lower-order difference is. Most financial price series are well-modelled as : the levels are non-stationary (random-walk like) but the first differences (returns) are approximately stationary.
The implication for trading: if is , then has no time-invariant mean to revert to. Strategies that bet on returning to a normal level without further qualification are statistically misspecified.
§2.3 Cointegration
Cointegration was introduced by Granger (1981) and formalised by Engle and Granger (1987). Two series and are cointegrated if there exists such that the linear combination
is stationary (). The coefficient is the cointegrating coefficient (in trading terminology, the hedge ratio). The combination is the cointegrating residual (in trading terminology, the spread).
The economic interpretation: although neither price has a stable mean, a specific linear combination does. The spread fluctuates around its long-run mean and exhibits the mean-reversion that pairs trading exploits.
Granger was awarded the 2003 Nobel Memorial Prize in Economic Sciences for methods of analysing economic time series with common trends (Royal Swedish Academy of Sciences 2003).
§2.4 The Granger Representation Theorem
The Granger Representation Theorem (Engle and Granger 1987, Theorem 1) establishes that if and are cointegrated, the system admits a Vector Error Correction Model (VECM) representation:
The term is the error-correction term, the previous period s deviation from equilibrium. The coefficients are the adjustment speeds governing how each series responds to disequilibrium. At least one of is nonzero by the cointegration assumption.
For pairs trading, the practical significance is that the VECM provides predictability. Deviations from equilibrium contain information about future short-term returns of both legs. The adjustment-speed coefficients determine which leg is expected to do most of the correcting.
§2.5 Multivariate cointegration
For an n-dimensional process , Johansen (1988, 1991) and Johansen and Juselius (1990) developed the maximum-likelihood framework. The VECM is
where the rank of the long-run impact matrix equals the number of cointegrating relationships. If with , then with , where the columns of are the cointegrating vectors and the columns of are the corresponding adjustment-speed vectors.
The Johansen trace statistic and maximum-eigenvalue statistic provide asymptotic tests for the rank r. Critical values were tabulated by Osterwald-Lenum (1992) and refined in subsequent work.
§2.6 Mean-reversion as a continuous-time process
The cointegrating residual is often modelled as an Ornstein-Uhlenbeck process (Uhlenbeck and Ornstein 1930):
where is the mean-reversion rate, is the long-run mean, is the diffusion coefficient and is standard Brownian motion. The OU process has stationary distribution .
The half-life of mean reversion is the expected time for a shock to decay to half its initial magnitude:
For a trading horizon to be commensurate with the mean-reversion timescale, holding periods should be on the order of one to several half-lives. Positions held substantially longer suffer from cumulative risk without commensurate edge. Conventional practice computes from a discrete-time AR(1) fit on the empirical spread:
with in the discretised form.
§2.7 The Hurst exponent
The Hurst exponent characterises the long-range dependence of a time series (Hurst 1951; Mandelbrot and Van Ness 1968). For financial-time-series applications:
- indicates anti-persistence (mean-reverting behaviour)
- indicates a random walk (no long-range dependence)
- indicates persistence (trending behaviour)
Hurst-exponent screening is a useful auxiliary filter for candidate cointegrated pairs: a pair whose ADF test passes but whose spread has is likely a false positive. Estimation methods include rescaled range analysis (R/S), the periodogram method and Detrended Fluctuation Analysis (Peng et al. 1994).
§3
Method families
The pairs-trading literature is conventionally organised into five method families based on the choice of pair selection criterion and signal generation. The taxonomy below follows Krauss (2017), the canonical literature review.
§3.1 Distance method
The first published academic study of pairs trading on a large scale was Gatev, Goetzmann and Rouwenhorst (2006). Their methodology: over a 12-month formation window, normalise each asset s cumulative total return to start at 1.0. For each pair in the universe, compute the sum of squared differences of the normalised price paths:
Sort pairs in ascending order of SSD; select the top k pairs (the original study used the top 20). Over a 6-month trading window, open a divergence position whenever the normalised paths differ by more than 2 historical standard deviations of their formation-period distance. Close when the paths re-converge.
Empirical results on 1962 to 2002 US equities: about 11% annualised excess return, declining over time. Do and Faff (2010, 2012) document substantial post-publication decline in distance-method profitability, consistent with the strategy being arbitraged away as it became widely known.
§3.2 Cointegration method
The cointegration approach replaces the distance criterion with the formal statistical test for a stable long-run relationship. The two main estimation procedures are due to Engle and Granger (1987) and Johansen (1988, 1991).
Engle-Granger two-step procedure. Step 1: estimate the cointegrating regression by OLS,
Step 2: apply the ADF test to the residuals using the Engle-Granger critical values (Engle and Yoo 1987; MacKinnon 1991, 2010), which differ from the standard ADF values because the residuals are estimated rather than observed. If the null of unit root is rejected, are cointegrated with hedge ratio .
Johansen procedure. For , estimate the VECM by maximum likelihood under Gaussian innovations. Test using the trace statistic:
or the maximum-eigenvalue statistic. The Johansen procedure is symmetric, handles assets and provides a formal test for the number of cointegrating relationships, but is more sensitive to model specification (lag length, deterministic terms).
§3.3 Time-series approach
The time-series approach models the spread directly as a stochastic process, most commonly the Ornstein-Uhlenbeck process, and derives trading rules from its properties. Common variants include exponential OU for spreads with multiplicative mean reversion, Cox-Ingersoll-Ross (Cox, Ingersoll, Ross 1985) for non-negative spreads and Lévy-driven OU (Endres and Stübinger 2019) for spreads with non-Gaussian jump behaviour.
Optimal entry/exit thresholds. Bertram (2010) derives analytically the entry and exit z-score thresholds that maximise the long-run expected profit per unit time under the OU model. For symmetric thresholds a (entry) and b (exit) with , the expected return per unit time admits a closed-form expression involving the imaginary error function and the Mills ratio. Bertram (2010, eq. 17 to 18) provides the first-order conditions for the optimum. Zeng and Lee (2014) extend the framework to incorporate transaction costs explicitly.
Stochastic optimal control. The time-series approach generalises to the stochastic optimal control problem: given the OU spread, find the optimal portfolio weight that maximises expected utility. Mudchanatongsuk, Primbs and Wong (2008) solve this under constant relative risk aversion (CRRA); Jurek (2007) provides a similar treatment with consumption decisions. The optimal weight is typically a smooth function of the spread s distance from equilibrium rather than the discrete in/out trading rules of threshold approaches.
§3.4 Copula approach
The cointegration approach assumes the dependence between two assets is linear and stationary. The copula approach (Liew and Wu 2013; Stander, Marais and Botha 2013; Krauss and Stübinger 2017) relaxes the linearity assumption by modelling the joint distribution of returns directly using copula functions.
By Sklar s theorem (Sklar 1959), any continuous bivariate distribution F admits the unique decomposition
where are the marginal distributions and is the copula, a function fully characterising the dependence structure independently of the marginals.
The trading signal is constructed from the conditional copula probability (the mispricing index of Liew and Wu 2013):
where , . Extreme values of MPI indicate that the joint observation is unusual under the historical dependence structure, interpreted as a signal that asset X is mispriced relative to asset Y given the latter s value.
Common copula families include elliptical (Gaussian, Student-t), Archimedean (Clayton, Gumbel, Frank, Joe) and vine constructions (Bedford and Cooke 2002). Selection is typically by AIC, BIC or hypothesis tests (Genest and Rivest 1993; Chen and Fan 2006). The principal limitation is opacity of fitted copulas and the higher dimensionality of the parameter space.
§3.5 Machine-learning approaches
The role of machine learning in pairs trading is principally in pair selection rather than signal generation. Sarmento and Horta (2020) provide the canonical treatment: principal component analysis on returns to reduce dimensionality, followed by density-based clustering (OPTICS, Ankerst et al. 1999, or DBSCAN, Ester et al. 1996) to group assets with similar return profiles. Candidate pairs are formed within clusters and subjected to cointegration testing as a final filter.
Avellaneda and Lee (2010) use PCA differently: they extract synthetic industry factors from the return covariance matrix, then model the idiosyncratic residual of each stock as an OU process. The strategy goes long stocks whose residual is below its equilibrium and short stocks whose residual is above. This is a multi-asset extension of pairs trading implemented at the residual level.
Neural-network forecasting of spread returns (Krauss, Do and Huck 2017; Fischer and Krauss 2018) is highly sensitive to the training window and exhibits poor out-of-sample stability relative to the structural cointegration approach.
§4
Hedge ratio estimation
The cointegrating coefficient is the central nuisance parameter of the cointegration approach. The choice of estimator materially affects the behaviour of the resulting spread, including its empirical mean-reversion speed, its statistical stationarity and its sensitivity to outliers in the calibration window. The implementation-companion paper covers the six common estimators in depth: Pairs Trading Implementation §2.
§5
Backtesting and validation
The principal source for rigorous backtesting methodology is Lopez de Prado (2018), Advances in Financial Machine Learning, particularly Chapters 11 to 14. The companion paper treats this in depth in §4. Briefly: walk-forward partitioning to separate training and test windows; block bootstrap (Künsch 1989) and Monte Carlo simulation to estimate distributional outcomes; Probabilistic Sharpe Ratio (Bailey and Lopez de Prado 2012), Deflated Sharpe Ratio (Bailey and Lopez de Prado 2014) and Probability of Backtest Overfitting (Bailey, Borwein, Lopez de Prado and Zhu 2017) to quantify the statistical credibility of reported performance. Transaction-cost realism is non-negotiable. Survivorship-bias-free universe construction is required.
§6
Multi-asset extensions
Statistical arbitrage extends naturally from two-asset pairs to portfolios of cointegrated assets. The Johansen multivariate procedure (§2.5, §3.2) provides the estimation machinery. For trading, the construction is
where is a cointegrating vector. is the multi-leg spread, modelled and traded as in the two-asset case.
Sparse cointegration. For large n, dense weight vectors imply trading every asset in the portfolio at every rebalance, prohibitively expensive. Sparse formulations (d Aspremont 2011; Cuturi and d Aspremont 2013) seek with at most k nonzero entries, formulating the problem as an -constrained optimisation solved via convex relaxation.
Galenko formulation. Galenko, Popova and Popova (2012) formulate the n-asset stat-arb problem as a convex optimisation that maximises expected return subject to risk and transaction-cost constraints.
Avellaneda-Lee statistical arbitrage. Avellaneda and Lee (2010) extend the framework to the full US equity universe by first removing common factor exposure via PCA, then trading each asset s idiosyncratic OU residual against its equilibrium. Implemented as a market-neutral long-short portfolio with positions in dozens to hundreds of names simultaneously.
§7
Cryptocurrency-specific considerations
Cryptocurrency perpetual futures markets differ from the US equity markets on which pairs trading was developed in five material respects. Each modifies the conventional methodology.
Continuous trading. Crypto markets trade 24/7/365. There is no closing auction, opening rotation or overnight gap. Daily-bar models developed for equity markets are mis-specified: information arrival is continuous and bar boundaries are arbitrary clock conventions. Practitioners typically work with shorter bar intervals (1-minute, 5-minute, 1-hour) adapted to the relevant trading horizon.
Faster non-stationarity. Empirical evidence (Fil and Kristoufek 2020; Lintilhac and Tourin 2017) indicates that crypto-pair relationships drift on substantially shorter timescales than equity-pair relationships, weeks to months rather than quarters to years. Backtest horizons and re-estimation cadences must be adjusted accordingly.
Perpetual funding payments. Crypto perpetuals fund every eight hours on most major exchanges (Binance, Bybit, OKX) and continuously on some (dYdX, GMX). The funding rate transfers a structural cash flow from one side of the market to the other and is a first-order PnL contributor for positions held across funding stamps.
Liquidity asymmetry. Major pairs (BTC, ETH, SOL) have institutional-scale depth on both legs; second- and third-tier altcoins have retail-scale depth and exhibit different microstructure dynamics including thinner books, wider spreads and higher liquidation cascade risk. A strategy calibrated on majors does not in general port to alts.
Cross-venue structure. The same notional pair traded on two exchanges (BTC perpetual on Binance vs. Bybit) is in practice two distinct instruments with different fee tiers, funding rates, liquidation engines and order book structures. Cross-venue cointegration introduces additional sources of non-stationarity from exchange-specific events (listings, delistings, fee changes, regulatory actions).
REFERENCES
Cited works
- Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J. (1999). OPTICS: ordering points to identify the clustering structure. Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, 49 to 60.
- Avellaneda, M., Lee, J.H. (2010). Statistical arbitrage in the U.S. equities market. Quantitative Finance 10(7), 761 to 782.
- Bailey, D.H., Borwein, J.M., Lopez de Prado, M., Zhu, Q.J. (2014). Pseudo-mathematics and financial charlatanism. The effects of backtest overfitting on out-of-sample performance. Notices of the American Mathematical Society 61(5), 458 to 471.
- Bailey, D.H., Borwein, J.M., Lopez de Prado, M., Zhu, Q.J. (2017). The probability of backtest overfitting. Journal of Computational Finance 20(4), 39 to 69.
- Bailey, D.H., Lopez de Prado, M. (2012). The Sharpe ratio efficient frontier. Journal of Risk 15(2), 3 to 44.
- Bailey, D.H., Lopez de Prado, M. (2014). The deflated Sharpe ratio. Correcting for selection bias, backtest overfitting and non-normality. Journal of Portfolio Management 40(5), 94 to 107.
- Bedford, T., Cooke, R.M. (2002). Vines. A new graphical model for dependent random variables. Annals of Statistics 30(4), 1031 to 1068.
- Bertram, W.K. (2010). Analytic solutions for optimal statistical arbitrage trading. Physica A 389(11), 2234 to 2243.
- Box, G.E.P., Tiao, G.C. (1977). A canonical analysis of multiple time series. Biometrika 64(2), 355 to 365.
- Chan, E. (2013). Algorithmic Trading. Winning Strategies and Their Rationale. John Wiley & Sons.
- Chen, X., Fan, Y. (2006). Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspecification. Journal of Econometrics 135(1-2), 125 to 154.
- Cox, J.C., Ingersoll, J.E., Ross, S.A. (1985). A theory of the term structure of interest rates. Econometrica 53(2), 385 to 407.
- Cuturi, M., d Aspremont, A. (2013). Mean reversion with a variance threshold. Proceedings of the 30th International Conference on Machine Learning (ICML), 271 to 279.
- d Aspremont, A. (2011). Identifying small mean-reverting portfolios. Quantitative Finance 11(3), 351 to 364.
- Dickey, D.A., Fuller, W.A. (1979). Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 74(366a), 427 to 431.
- Do, B., Faff, R. (2010). Does simple pairs trading still work? Financial Analysts Journal 66(4), 83 to 95.
- Do, B., Faff, R. (2012). Are pairs trading profits robust to trading costs? Journal of Financial Research 35(2), 261 to 287.
- Elliott, R.J., van der Hoek, J., Malcolm, W.P. (2005). Pairs trading. Quantitative Finance 5(3), 271 to 276.
- Endres, S., Stübinger, J. (2019). Optimal trading strategies for Lévy-driven Ornstein-Uhlenbeck processes. Applied Economics 51(29), 3153 to 3169.
- Engle, R.F., Granger, C.W.J. (1987). Co-integration and error correction. Representation, estimation and testing. Econometrica 55(2), 251 to 276.
- Engle, R.F., Yoo, B.S. (1987). Forecasting and testing in co-integrated systems. Journal of Econometrics 35(1), 143 to 159.
- Ester, M., Kriegel, H.-P., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD), 226 to 231.
- Fil, M., Kristoufek, L. (2020). Pairs trading in cryptocurrency markets. IEEE Access 8, 172644 to 172651.
- Fischer, T., Krauss, C. (2018). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research 270(2), 654 to 669.
- Galenko, A., Popova, E., Popova, I. (2012). Trading in the presence of cointegration. Journal of Alternative Investments 15(1), 85 to 97.
- Gatev, E., Goetzmann, W.N., Rouwenhorst, K.G. (2006). Pairs trading. Performance of a relative-value arbitrage rule. Review of Financial Studies 19(3), 797 to 827.
- Genest, C., Rivest, L.-P. (1993). Statistical inference procedures for bivariate Archimedean copulas. Journal of the American Statistical Association 88(423), 1034 to 1043.
- Granger, C.W.J. (1981). Some properties of time series data and their use in econometric model specification. Journal of Econometrics 16(1), 121 to 130.
- Hurst, H.E. (1951). Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers 116, 770 to 808.
- Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12(2-3), 231 to 254.
- Johansen, S. (1991). Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica 59(6), 1551 to 1580.
- Johansen, S., Juselius, K. (1990). Maximum likelihood estimation and inference on cointegration with applications to the demand for money. Oxford Bulletin of Economics and Statistics 52(2), 169 to 210.
- Jurek, J.W. (2007). Optimal long-run reversal trading. Working paper, Princeton University.
- Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering 82(1), 35 to 45.
- Krauss, C. (2017). Statistical arbitrage pairs trading strategies. Review and outlook. Journal of Economic Surveys 31(2), 513 to 545.
- Krauss, C., Do, X.A., Huck, N. (2017). Deep neural networks, gradient-boosted trees, random forests. Statistical arbitrage on the S&P 500. European Journal of Operational Research 259(2), 689 to 702.
- Krauss, C., Stübinger, J. (2017). Non-linear dependence modeling with bivariate copulas. Statistical arbitrage pairs trading on the S&P 100. Applied Economics 49(52), 5352 to 5369.
- Künsch, H.R. (1989). The jackknife and the bootstrap for general stationary observations. Annals of Statistics 17(3), 1217 to 1241.
- Kwiatkowski, D., Phillips, P.C.B., Schmidt, P., Shin, Y. (1992). Testing the null hypothesis of stationarity against the alternative of a unit root. Journal of Econometrics 54(1-3), 159 to 178.
- Liew, R.Q., Wu, Y. (2013). Pairs trading. A copula approach. Journal of Derivatives & Hedge Funds 19(1), 12 to 30.
- Lintilhac, P.S., Tourin, A. (2017). Model-based pairs trading in the bitcoin markets. Quantitative Finance 17(5), 703 to 716.
- Lopez de Prado, M. (2018). Advances in Financial Machine Learning. John Wiley & Sons.
- MacKinnon, J.G. (1991). Critical values for cointegration tests. In R.F. Engle and C.W.J. Granger, eds., Long-Run Economic Relationships, Oxford University Press, 267 to 276.
- MacKinnon, J.G. (2010). Critical values for cointegration tests. Queen s Economics Department Working Paper No. 1227.
- Mandelbrot, B.B., Van Ness, J.W. (1968). Fractional Brownian motions, fractional noises and applications. SIAM Review 10(4), 422 to 437.
- Mudchanatongsuk, S., Primbs, J.A., Wong, W. (2008). Optimal pairs trading. A stochastic control approach. Proceedings of the 2008 American Control Conference, 1035 to 1039.
- Osterwald-Lenum, M. (1992). A note with quantiles of the asymptotic distribution of the maximum likelihood cointegration rank test statistics. Oxford Bulletin of Economics and Statistics 54(3), 461 to 472.
- Peng, C.-K., Buldyrev, S.V., Havlin, S., Simons, M., Stanley, H.E., Goldberger, A.L. (1994). Mosaic organization of DNA nucleotides. Physical Review E 49(2), 1685 to 1689.
- Phillips, P.C.B., Perron, P. (1988). Testing for a unit root in time series regression. Biometrika 75(2), 335 to 346.
- Pole, A. (2007). Statistical Arbitrage. Algorithmic Trading Insights and Techniques. John Wiley & Sons.
- Politis, D.N., Romano, J.P. (1994). The stationary bootstrap. Journal of the American Statistical Association 89(428), 1303 to 1313.
- Royal Swedish Academy of Sciences (2003). The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2003. Press release, 8 October 2003.
- Said, S.E., Dickey, D.A. (1984). Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika 71(3), 599 to 607.
- Sarmento, S.M., Horta, N. (2020). A Machine Learning based Pairs Trading Investment Strategy. Springer.
- Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges. Publications de l Institut de Statistique de l Université de Paris 8, 229 to 231.
- Stander, Y., Marais, D., Botha, I. (2013). Trading strategies with copulas. Journal of Economic and Financial Sciences 6(1), 83 to 107.
- Uhlenbeck, G.E., Ornstein, L.S. (1930). On the theory of the Brownian motion. Physical Review 36(5), 823 to 841.
- Vidyamurthy, G. (2004). Pairs Trading. Quantitative Methods and Analysis. John Wiley & Sons.
- Zeng, Z., Lee, C.G. (2014). Pairs trading. Optimal thresholds and profitability. Quantitative Finance 14(11), 1881 to 1893.
Citation. Bonton AI, Hedgicore Research (2026). Statistical Arbitrage. A Technical Guide.. v1.0. hedgicore.com/research
Hedgicore is a real-time pairs analytics platform powered by the Hedgicore Engine. Built by the team at Bonton AI.
Risk disclaimer: Hedgicore is an analytics platform. It does not execute trades or provide financial advice. All trading carries risk of loss.
