financialnoob.me

Blog about quantitative finance

Pairs trading. Pair selection. Distance (Part 2)

In the previous post I tested distance method of pair selection (you can read it here). We found out that it was not very good in finding pairs of stocks that do not diverge too much during the trading period. In this short article I would like to implement several possible improvements to the distance method and test it on the same dataset.


Instead of relying on distance between cumulative returns of the two stocks in the pair, now I’m going to use several other tests to determine if the pair is eligible for trading. For each of the possible pairs I will test:

  • If stocks in the pair are cointegrated (CADF p-value < 0.01)
  • Hurst exponent of the spread < 0.5
  • Half-life of mean reversion of the spread more than 1 day and less than 30 days
  • Number of zero crossings of the spread > 12 per year

I will continue working only with pairs that satisfy all of the above criteria.

After that I will test several methods of selecting the best pairs for trading:

  • Pairs with the smallest Euclidean distance
  • Pairs with the highest number of zero crossings of the spread
  • Pairs with the highest Pearson correlation coefficient

I will omit the steps of downloading and preparing the data here, since I explained it in the previous post.


12 months formation period / 6 months trading period

I have tested all 263901 potential pairs and found 1703 pairs that satisfy the criteria described above. Below you can see the dataframe of selected pairs sorted by Euclidean distance between the cumulative returns of two constituent stocks.

Pairs sorted by Euclidean distance (ascending)

Let’s see how the spread of the top 5 pairs behave during the trading period by looking at the plots and some metrics.

Metrics of top 5 pairs during trading period

We can see that majority of the pairs diverge too much from the historical equilibrium calculated during formation period. Some of them seem to converge back, but in any case we don’t see many trading opportunities. Overall there is no much improvement compared to the tests in the previous post, where we used just Euclidean distance as a metric.


Now let’s try to select top 5 pairs with the highest number of zero crossings.

Pairs sorted by the number of zero crossings (descending)
Metrics of top 5 pairs during trading period

Here again we get similar results. All of the pairs diverge too much from their historical equilibrium: they go way too far above or beyond the 2 standard deviations band.


Last thing I want to try is to select top 5 pairs with the highest Pearson correlation coefficient.

Pairs sorted by Pearson correlation coefficient (descending)
Metrics of top 5 pairs during trading period

These pairs are also not very good. Basically all of the three methods give very similar results. Most of the selected pairs are not suitable for trading (at least using simple trading rules, such as described in Gatev et al. 2006).

Now I would like to try using longer formation periods. In the first part I tested 12, 24 and 36 months formation periods, but here all the tests I perform to determine eligible pairs take too much time to run, so I will skip 24 month formation period and move straight to the 36 months period.


36 months formation period / 6 months trading period

When we increase formation period to 36 months, only 236 out of 263901 potential pairs satisfy all four conditions. Now I will test three methods of pair selection starting with Euclidean distance.

Pairs sorted by Euclidean distance (ascending)
Metrics of top 5 pairs during trading period

Now it seems that we have at least some little improvement: the majority of the selected pairs don’t diverge too much and stay within the 2-SD band during the most part of the trading period. The number of trading opportunities is not big, but anyway this is the best set of pairs we’ve seen so far.


Let’s try sorting pairs by the number of zero crossings.

Pairs sorted by the number of zero crossings (descending)
Metrics of top 5 pairs during trading period

Only one pair behaves kind of nice — TII-TPH. All others diverge too far from the historical mean.

Now I’ll try to select top 5 pairs with the highest Pearson correlation coefficients.

Pairs sorted by Pearson correlation coefficient (descending)
Metrics of top 5 pairs during trading period

Here we see two pairs that stay within the 2-SD band for the most part of the trading period. Three other pairs diverge too much from the mean.


In this article I implemented and tested some improvements to the distance method of pair selection. Although we don’t see tremendous improvements in the quality of selected pairs (most pairs still diverge too much during the trading period), the results from combining these methods with longer formation periods look promising. I think it could be interesting to test some of the techniques implemented here together with the cointegration approach of pair selection. I will write about it in the next post.


Jupyter notebook with source code is available here.

Note: if you want to try to run this code on your laptop, you might want to use smaller stock universe; otherwise it might take a very long time.

If you have any questions, suggestions or corrections please post them in the comments. Thanks for reading.


Yuan Di prepared a Chinese adaptation of this article, which is available here.

Leave a Reply

Your email address will not be published. Required fields are marked *