financialnoob.me

Blog about quantitative finance

Pairs trading. Pair selection. Distance (Part 1)

In the previous post I described the basic principles of pairs trading strategies. Now I’d like to explore one of the most famous methods of pair selection, namely distance method. It was described in the paper ‘Pairs Trading: Performance of a Relative Value Arbitrage Rule’ (Gatev et al. 2006). I’m going to implement this method in python, apply it to a dataset of real world stock prices and try to analyse its advantages and disadvantages.


The idea is very simple: for each pair of stocks calculate Euclidean distance between their normalized cumulative returns and select pairs with the smallest distance for trading. The time period used for pair selection is called formation period. Immediately after it follows trading period, where we trade selected pairs of stocks. In the paper they use 12 month formation periods and 6 month trading periods. I would like to test formation periods of several lengths: 12 months, 24 months and 36 months. Trading period will always be 6 months.

I am going to use the constituents of Vanguard Small Cap Value ETF (VBR) as my stock universe. I believe that using small cap stocks will allow me to uncover more potential trading opportunities (because too much people are trading large cap highly liquid stocks, therefore the market is more crowded and so there are less trading opportunities).

First step is getting the data. To perform the tests I want I need 36+6=42 months of data. I will use a period from 2016–07–01 to 2019–12–31. At the time of writing VBR contains shares of 967 companies. I am going to choose only those for which:

  • there is data available on Yahoo Finance for the whole time period;
  • trading volume is bigger than 1000 on each trading day.

Applying the rules above, we are left with 727 stocks, which makes 727*726/2 = 263901 potential pairs. Below is the code I used to download the data (or you can just download it in csv format from here). Note: I’m using Adj Close prices to account for any corporate actions.

Now we need to transform prices into cumulative returns and we are ready to start testing. (Link to the Jupyter notebook with source code is provided in the end of the post).

For each combination of formation and trading periods I’m going to choose Top5 pairs with the smallest Euclidean distance. For each of those pairs I’m going to construct a portfolio (spread) consisting of two positions of equal size (in terms of allocated capital, not in terms of the number of stocks): long position in one stock and short position in another. I’m going to provide plots of the spread during both formation and trading periods as well as the following metrics:

  1. Euclidean distance between cumulative returns of the two stocks
  2. Cointegrated Augmented Dickey-Fuller (CADF) test p-value (if low -> cointegrated)
  3. Augmented Dickey-Fuller (ADF) test p-value (if low -> stationary)
  4. Standard deviation of the spread
  5. Number of zero crossings of the spread
  6. Hurst exponent (should be less than 0.5 for mean-reverting time series)
  7. Half-life of mean reversion
  8. Percentage of days within 2-SD band (If too much time is spent outside of 2-SD band -> pair is diverged too much for too long)

Let’s start.


12 months formation period / 6 months trading period

Below you can see the plots of 5 pairs with the smallest Euclidean distance.

As you can see, although all pairs demonstrate nice mean reverting behaviour in formation period, most of them behave differently in trading period. Out of five pairs only one (CBSH-NBTB) looks more or less mean reverting in trading period. Four other pairs deviate too much from their historical mean and don’t converge back (at least during 6 month trading period).

Top 5 pairs (12m formation period)

In the table above we can see that during formation period all pairs demonstrate a lot of mean-reverting properties: p-values of CADF and ADF tests are low, Hurst exponent is less than 0.5, many zero-crossings. Let’s look at the same metrics during trading period.

Top 5 pairs (6m trading period)

The table above confirms what we’ve already seen on the plots. Most pairs do not show mean-reverting properties. So, although distance method works very well in selecting mean-reverting pairs in-sample, the behaviour of the chosen pairs out-of-sample is very different. But what if 12 months formation period is not long enough? What if using longer formation period will help us uncover pairs of stocks with more stable long-term relationships? Let’s try.


24 months formation period / 6 months trading period

Again let’s look at the plots first.

Three out of five pairs (AGNC-MFA, MTG-ESNT, BANF-STBA) are far from historical mean already in the beginning of the trading period. They do not return to the mean value during the six month trading period. If we opened positions in those spread at the beginning of trading period, we would have to close them with losses. Other two pairs do not diverge that much from the mean, but they do not provide a lot of trading opportunities. Let’s look at the metrics.

Top 5 pairs (24m formation period)
Top 5 pairs (6m trading period)

No big improvement compared to the previous results. What if we increase formation period to 36 months?


36 months formation period / 6 months trading period

Plots:

Similar picture again. Three pairs diverge too much (although one of them seems to converge back to the mean by the end of trading period). Two other pairs do not provide many trading opportunities.

Top 5 pairs (36m formation period)

Top 5 pairs (6m trading period)

Conclusions

It seems that it is not possible to use distance method of pair selection for profitable trading. At least in this implementation and with this dataset. It does a good job of selecting pairs of stocks that are mean reverting in-sample, but most of the selected pairs diverge during the trading period (out-of-sample).

There are possible improvements described in different sources online that recommend additional conditions for pair selection, such as:

  • selecting pairs with highest number of zero crossings
  • selecting only pairs that are cointegrated
  • selecting pairs with high spread volatility
  • selecting only pairs with Hurst exponent less than 0.5
  • selecting pairs with low enough half-life of mean reversion

But if we look at the metrics of the top pairs during the formation period (in-sample) we can see that for the majority of selected pairs most of those conditions are already satisfied. So it is unlikely that introducing these conditions will help. I will probably try to test it anyway and describe the results in another post.

Another idea is to use longer trading periods. What if diverged pairs do converge back to their historical mean, it just takes more than 6 months? It is possible, but it doesn’t help. Such pairs will have long half-life of mean reversion and won’t provide enough trading opportunities. Moreover, the longer we hold the pair portfolio, the more likely it is that it will never converge (for example when companies’ risk profiles change). We are interested in pairs with short half-life of mean reversion that provide many trading opportunities, therefore increasing the trading period won’t help.

There is one more potential improvement (which I think is the most promising): using higher frequency data. Since pairs trading strategies are very well known, any arising trading opportunities probably do not last very long. I will test it and report the results in another post, if I can find high frequency data.

Other potential improvement has to do with my testing methodology. I didn’t do any quantitative comparison between the pairs selected using different lengths of formation period. Probably it is not very important now, but at some point we will have to compare different pair selection techniques and decide which one is the best. Of course it is possible to just backtest all possible combinations of pair selection techniques with all possible strategies and parameters, but it will greatly increase the probability of backtest overfitting. I think it should be possible to design statistical tests which can be used for quantitative comparison. I will research this question and write another post about it.


Jupyter notebook with source code is available here.

If you have any questions, suggestions or corrections please post them in the comments. Thanks for reading.


Yuan Di prepared a Chinese adaptation of this article, which is available here.

Leave a Reply

Your email address will not be published. Required fields are marked *