financialnoob.me

Blog about quantitative finance

Pairs trading. Pair selection. Cointegration (Part 3)

I believe that the main drawback of all the pair selection methods we tested so far is the assumption that market conditions don’t change during the whole trading period. We analyzed pricing data and assessed which pairs are suitable for trading only once in the beginning of the period. We then assumed that the relationship between the two stocks in selected pairs will continue to behave in the same manner as before. I think that this assumption is unrealistic in modern financial markets, which tend to by very dynamic.

In this article I would like to test how our methods of pair selection will perform if we limit the trading period to one month. Here is what I want to do:

  • Use 12 month formation period to select potential pairs for trading (using the same conditions as in the previous article).
  • Out of all potential pairs select only those that are more than 2 and less than 3 standard deviations away from the historical mean on the last day of formation period.
  • Implement and assess the performance of several machine learning algorithms in predicting which of the selected pairs are likely to converge in the next 30 days.
  • Estimate if the achieved performance is enough for creating a profitable trading strategy.

I will use historical price data for stocks from Vanguard Small-Cap Value Index Fund (VBR) from 01.07.2016 to 31.12.2019. Formation period will be 12 months long and I will roll it one month forward to get a total of 24 periods.


After processing all 24 periods we are left with the dataset below.

The last column (TTC) indicates the number of calendar days it took for the spread to return to its historical mean. It equals 1000 if the spread didn’t converge during the next 6 months.

Before proceeding further I remove samples with too large or too small hedge ratio: larger than 3 or smaller than 0.33.

After that we are left with 3391 samples. Summary statistic is provided below.

Now let’s look at the correlation matrix.

Again, as in the previous part, we can notice that none of the features are correlated with the dependent variable (TTC). The highest correlation coefficient is 0.07. This is not a good sign. But correlation measures only linear relationships, so it could still be possible to achieve satisfactory results.

Now we will look at the plots of the data.

Histograms of the variables
Density plots of the variables

We can notice that distributions of some features are skewed, so we might want to use power transformations to make them look more Gaussian.

I use the following code to prepare data for machine learning algorithms. The dependent (outcome) variable is binarized so that samples that took more than 30 days to converge belong to class 1 and samples that took less than 30 days to converge belong to class 0. Then I split the data into train and test datasets (70–30) using stratify parameter to preserve the class distribution of the original dataset.

Our task in then to predict which samples belong to class 0. I’m going to use the same custom accuracy score function that I used in the previous part, but with little modification. I’ve noticed that many ML algorithms don’t give more than 50% chance of belonging to class 0 to all of the samples. Such algorithms are useless for our end goal because we need to choose several pairs for trading. So now I’m going to give a score of 0 to any algorithm that fails to provide at least 10 samples with larger than 50% chance of belonging to class 0. The code for this metric is provided below.

We also need to calculate a baseline accuracy that we would achieve if we were selecting pairs at random. It is approximately 23%.

Benchmark score

I’m going to test the following machine learning algorithms:

  • Logistic Regression
  • Linear Discriminant Analysis
  • Quadratic Discriminant Analysis
  • K Neighbors
  • Decision Tree
  • Naive Bayes
  • Gaussian Process
  • Multi Layer Perceptron
  • Support Vector Machine

First I test them on original data without any transformation. The results of these tests are provided below.

The accuracy of most algorithms is comparable to our baseline accuracy of 23%. We can also notice that almost half of the algorithms have a score of zero, which means that they failed to provide 10 samples with more than 50% probability of belonging to class 0. Now let’s perform the same tests, but first transforming the data using PowerTransformer.

Most algorithms perform approximately the same as before. The performance of Logistic Regression improves a little bit. What if we also try to apply PCA to reduce the number of features?

The accuracy of most algorithms decreases. Let’s try to apply several ensemble algorithms.

Ensemble models do not work well of this dataset.

The best algorithms are able to correctly identify only about 30–35% of the converging pairs, which is a little better than our baseline metric. Let’s try assess the performance of one of our models. I think that logistic regression is the best candidate.

We achieve the performance of 0.4, which means that only 4 out of 10 predicted pairs do converge during the next 30 days. If we assume that our gains from correctly identified pairs are the same as our losses from incorrectly identified pairs, then such performance is not enough for a profitable trading strategy.


In the previous articles I was trying to find stock pairs which should exhibit mean reverting behaviour during the next 6 months. All my attempts failed. In this article I tried to simplify the task and shorten the forecast time horizon to 1 month. And again I wasn’t able to achieve satisfactory results.

There are some other improvements we could make:

  • try more different data transformations to improve the performance of ML algorithms
  • use feature engineering to add some new features to the data
  • gather more data
  • find a strategy that is profitable even if less than 50% of predicted pairs do in fact converge (for example creating our trading rules in such a way that we gain more on converging pairs, than we lose on non-converging pairs)

I think that the biggest disadvantage of different approaches I tested so far is not using fresh market data. We should incorporate new data into our models as fast as possible (at least daily), use that data to update our beliefs about the market and adjust our positions accordingly. Another possible disadvantage is the fact that these approaches are very well known and very simple, which means a lot of people are using them and trading opportunities quickly disappear. In the next articles I will try to test more advanced methods and techniques for pairs trading.


Jupyter notebook with source code is available here.

Note: if you want to try to run this code on your laptop, you might want to use smaller stock universe; otherwise it might take a very long time. You can download the dataset I used for machine learning part here.

If you have any questions, suggestions or corrections please post them in the comments. Thanks for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *