Pairs trading with partial cointegration

March 15, 2023

This article is based on the paper ‘Pairs trading with partial cointegration’ (Clegg & Krauss, 2016). I am going to describe Partial Cointegration (PCI) model presented in the paper, explain how to estimate its parameters and implement a trading strategy based on it. I will backtest the strategy both on synthetic and real data and compare its performance to a basic cointegration-based strategy.

First we need to understand the partial cointegration model. Its definition is shown below.

Basically, two time series are partially cointegrated if there exists a linear combination of them, which is partially autoregressive (PAR). I described PAR model in my previous article, so I’m assuming that the reader is familiar with it. In the definition above, W_t is partially autoregressive — it consists of mean-reverting component M_t and random walk component R_t.

Let’s generate and plot a sample of partially cointergated time series. Code for generating them is shown below. First we generate X_1 as a geometric random walk starting at 100. Then we generate PAR residual and calculate X_2 according to the model definition.

Partially cointegrated time series sample

Time series on the plot above seem to closely follow each other and look like they might be cointegrated. On the screenshot below I show the results of two cointegration tests (Johansen and Engle-Granger). As you see, both tests failed to reject the null hypothesis of no cointegration.

Cointegration tests applied to partially cointegrated series

Now we need to implement a procedure to estimate parameters of the model. Estimation procedure is similar to the PAR model, we just need to add two additional parameters — alpha and beta. First we need to calculate the initial estimates of the parameters. Formulas for initial estimates of alpha and beta are provided in the paper (you can find them in the code below). To calculate initial estimates of PAR parameters, we fit PAR model on residuals W_t (calculated using initial estimates of alpha and beta). Then we optimize all parameters together using scipy minimize. Code for estimating PCI model is provided below.

Recall that pure mean-reverting process and pure random walk process are special cases of PAR model. We are going to use different modes later, when we try to determine whether PAR model is a good fit for the data.

To check that the estimation procedure works correctly I apply it to the time series generated earlier. Results are shown below. Our estimates are close to real values of parameters.

Now we can try to backtest trading strategies on synthetic data. In the paper four different strategies are tested, but I am going to test only two of them — one based on standard cointegration (CI1 in the paper) and another based on partial cointegration (PCI in the paper). Note that I am going to test very basic version of the strategies without using stop-losses, calculating transaction costs, etc.

First we prepare data as follows. I use 1000 days training period and 125 days trading period.

For cointegration-based strategy we perform OLS on training data to find parameters alpha and beta and calculate the spread.

Plot of the spread during training period is demonstrated below. Dotted lines indicate 1 and 2 standard deviations distances from the mean.

Now let’s calculate and plot the spread during trading period.

We can see from the plot that traditional cointegration-based trading strategy won’t be profitable here. The spread never converges back to its historical mean during trading period. But let’s try to backtest the strategy anyway.

Cumulative returns (cointegration-based algo)

As expected, cointegration-based strategy is not very profitable here.

Note that we have dollar-neutral positions in each pair — equal amount of capital is assigned to long and short positions. We do it here since we know that beta is close to 1 and therefore such position follows the spread closely. If beta was far from 1, we’d need to adjust our positions and assign beta units of capital to the first stock and 1 unit of capital to the second stock.

Now we will try to backtest PCI- based trading strategy on the same data. Trading algorithm based on partial cointegration works as follows. First we estimate model parameters using training data and calculate the spread. Then we apply Kalman filter to estimate the mean-reverting component of the spread. Trading rules are based on the values of mean-reverting component:

Long position is opened when it is below -1*sigma_M.
Long position is closed when it is above 0.5*sigma_M.
Short position is opened when it is above 1*sigma_M.
Short position is closed when it is below -0.5*sigma_M.

Code for estimating parameters and calculating the spread is shown below.

Now we separate and plot the mean reverting component.

Plot of mean reverting component (trading period)

Next we backtest the strategy.

Cumulative returns (partial cointegration strategy)

As you can see above, trading strategy based on partial cointegration performs significantly better than simple cointergation-based strategy. But these results are based only on 1 sample of synthetic data. Let’s perform many simulations and compare average performances of these two strategies.

I’m going to perform 5000 simulations. For each simulation, I generate partially cointegrated time series with the same parameters, backtest two strategies and save their performance metrics (total return and Sharpe ratio). Code for doing it is shown below.

Functions ci_backtest and pci_backtest perform the same backtests as I demonstrated above. Next I remove outliers and plot histograms of total returns and Sharpe ratios for each strategy.

We can see above that PCI-based strategy has better performance than simple cointegration-based strategy. Most of its returns are greater than zero and average Sharpe ratio is around 2. Now we are ready to backtest it on some real data.

As my stock universe I’m going to use 100 small cap stocks selected randomly from Vanguard Small-Cap Value Index Fund ETF. Training period is 4 years — from 2009–07–01 to 2013–06–30. Trading period is 6 months — from 2013–07–01 to 2013–12–31.

Code for loading and preparing data is shown below.

We start with basic cointegration-based trading strategy. Pairs for trading are selected based on two tests — Augmented Dickey-Fuller test and Johansen test. We choose only those pairs, for which both tests reject the null at 95% level.

I test each pair and save results to a dataframe along with spread parameters alpha and beta.

On the screenshot above we see that there are 232 potential pairs to trade.

Note that we get a whole range of different values of beta. So, as I explained earlier, we should not use dollar-neutral positions for such pairs. To simplify things a little bit I am going to select pairs with hedge ratio beta close to 1, so that we can have equal amount of capital allocated to each stock in a pair.

After applying this additional condition we are left with the following pairs.

Next selection procedure is based on in-sample Sharpe ratio. So let’s calculate in-sample Sharpe ratios for the pairs selected above. Results are shown on the screenshot below.

I am going to further limit potential pairs and select only those that have Sharpe ratio bigger than 0 (if it’s not profitable in-sample, we can’t expect it to be profitable out-of-sample).

Then I am selecting top 5 pairs consisting of different stocks (so that we trade in 10 different stocks). In the paper top 20 pairs are chosen, but my dataset is a lot smaller (100 stocks vs. 500 stocks), so I’m using only 5. Final selection is shown below.

In-sample Sharpe ratios of selected pairs

Now we backtest these pairs out-of-sample. Code for calculating returns is shown below.

Let’s compute and plot cumulative returns for each pair individually.

Cumulative returns of individual pairs (cointegration algo)

As we expected, results are not very impressive. Similar to what we had with synthetic data. Let’s calculate cumulative return and performance metrics of the strategy.

Now let’s backtest our second strategy, which is based on partial cointegration. Pair selection here will be based on parameters of PCI model, so first we need to estimate those parameters for each pair. Code for doing it is shown below. Note that for each pair I’m trying to fit the model in three different modes (PAR, AR and RW). Then I’m going to select only those pairs, for which PAR mode is the best fit (based on AIC). This is different from the paper, where likelihood ratio test is used for pair selection.

Here’s a quote from my previous article about PAR model explaining why I’m not using likelihood ratio test:

I have failed to replicate the simulation studies with the likelihood ratio tests. For some reason my critical values are too different from the ones, provided in the paper. I believe it has something to do with particular implementation of the numerical optimization procedure. In some cases it might not converge to the true global minimum, resulting in outliers that shift the quantiles in wrong direction. Even in the paper, quantiles from the simulation are different from the ones predicted by Wilks’ theorem. I will leave the code for simulation studies in the Jupyter notebook, but I won’t post it here.

Instead of relying on likelihood ratio test, I’m going to use Akaike Information Criterion (AIC) to determine which model better describes the data. Some statisticians even suggest that this approach is better than using statistical tests (e.g. here).

This process takes quite some time. I’m going to post pci_params.csv file on my github page, so that you can work with it straight away.

Note that AR and RW modes are only fitted if conditions rho>0.5 and R²>0.5are satisfied. This is done to save time because further we are going to limit potential pairs to the ones satisfying these conditions.

After estimating parameters of the model, we are going to further filter them based on the following criteria:

AIC of the PAR mode must be smaller than AIC of other models.
Coefficient of mean reversion (rho) must be bigger than 0.5
Proportion of variance attributable to mean reversion (R²) must be bigger than 0.5

After applying these conditions we get the following results.

We get 415 potentially tradable pairs.

I am going to apply two more conditions. One condition on beta (the same as we used above). And another condition on rho. I limit rho to be less than 0.8 since pairs with smaller values of rho have better performance (see figure 1 in the paper).

We are left with the following pairs.

Next we calculate in-sample Sharpe ratio and select top 5 pairs for trading.

Now we are ready for backtesting. Code for performing it is shown below. At each time step we calculate the spread using historical parameters, separate the mean reverting component and calculate z-score. Rules for opening and closing positions are the same we used before when trading on synthetic data.

Let’s look at some plots and performance metrics.

On the plot above we can note that 4 out of 5 pairs have positive total returns and 1 pair has negative total return.

Cumulative return (partial cointegration algo)

We can see that performance of PCI-based strategy is significantly better than performance of CI-based strategy. In my opinion it even looks too good. Probably we just got lucky with this particular selection of stocks. Clearly another backtest with bigger stock universe and larger time frame is needed to confirm the results.

Ideas for further research:

Allow non-equal capital allocation to stocks in a pair to remove the constraint on hedge ratio beta.
Test how trading period returns depend on historical values of rho, R², and other parameters.
Use some other metrics instead of in-sample Sharpe ratio for pair selection (e.g. Sortino ratio).
Use BIC instead of AIC for model selection.

Jupyter notebook with source code is available here.

If you have any questions, suggestions or corrections please post them in the comments. Thanks for reading.

References

[1] Pairs trading with partial cointegration

[2] Introduction to partial autoregressive (PAR) model

[3] https://github.com/matthewclegg/partialCI

[4] https://github.com/matthewclegg/partialAR

[5] https://stackoverflow.com/questions/11882393/matplotlib-disregard-outliers-when-plotting