financialnoob.me

Blog about quantitative finance

Pairs trading. Pair selection. Cointegration (Part 2)

In the previous article we discovered that although cointegration method provides us with a lot more potentially tradable pairs, our methods for selecting best pairs for trading do not work as expected. Most of the chosen pairs diverge too much during the trading period. In this post I would like to test several machine learning techniques to combine several metrics and use them to predict which stocks will behave ‘nicely’ during the trading period.


First step is preparing the data for machine learning algorithms. I will have two datasets: one for training and selecting the model and another one for pure out-of-sample test. The stock universe in both datasets will be the same (stocks from VBR small cap ETF), but I will use different non-overlapping time periods:

  • 01.01.2013–30.06.2016 for train data (3 years formation period + 6 months trading period)
  • 01.07.2016–31.12.2019 for test data (also 3 years formation period + 6 months trading period)

Then for each dataset I do the following:

  1. Use stock prices from formation period to select potential pairs for trading, applying the same criteria as in the previous part:
  • CADF p-value < 0.01
  • Hurst Exponent < 0.5
  • 1 < Half life of mean reversion < 30
  • number of zero crossings > 12 per year

2. For each of selected pairs I create a portfolio (spread) and calculate the following metrics (using only prices from formation period):

  • Euclidean distance from the mean
  • CADF p-value
  • ADF p-value
  • Spread standard deviation
  • Pearson correlation coefficient
  • Number of zero crossings
  • Hurst exponent
  • Half life of mean reversion
  • Percentage of days spent within historical 2-SD band
  • Hedge ratio

These are our independent variables.

3. For each pair calculate the number of zero crossings using prices from the trading period. This is our dependent variable, which we try to predict.

I decided to use the number of zero crossings, because out of all other metrics this one we really do need for successful trading since we close our position when the portfolio price crosses its historical mean. No crossings -> no closed positions -> no profit.

After performing all the procedures described above we have the following dataset:

Let’s start exploring the data. First we look at descriptive statistics.

Here I notice several things:

  • The minimum value of hedge ratio is too low. It basically means that we need to allocate 99% of the capital to one stocks and 1% of the capital to another stock, which I think is not a good idea. So we need to remove the pairs with such small values of hedge ratio.
  • The most interesting pairs (pairs with the most number of zero crossings in trading period) are basically outliers. 75% of the pairs has less than 8 zero crossings.
  • Not all features are on the same scale, which means that we might need to apply some transformations to our data.

Before proceeding further I am going to remove all pairs with hedge ratios less than 0.2 and more than 5.

Now let’s look at correlations.

Pairwise correlations of the variables

Couple of things to notice:

  • Some of our independent variables have a high correlation coefficient, which might affect the performance of several machine learning algorithms.
  • Independent variables have almost zero correlation with the dependent variable, which is not a good sign (according to this).

Now we will try to visualize our data.

Histograms of the variables
Density plots of the variables

We see that a lot of distributions are skewed, so we might want to apply power transformation to our features.


Before testing different ML algorithms we need to be clear what is our end goal. We don’t really need to determine how many zero crossing each pair has during the trading period. What we need to do is to find several pairs that have high number of zero crossings. Therefore I think we should formulate the problem in terms of classification task.

When we were selecting potential pairs we had a requirement for a pair to have more than 12 zero crossings per year. Since our trading period is 6 months, let’s try to predict which pairs will have more than 6 zero crossings.

First we prepare our train and test datasets:

Now let’s determine our baseline. How many pairs will have more than 6 zero crossings if we selected them randomly?

Approximately 30%. But to have a successful trading algorithms we need to correctly determine more than 50% (assuming that the amount of money we lose on losing positions is the same as the amount of money we gain on winning positions).

Another thing is that we don’t really need to accurately classify all the pairs that have more than 6 zero crossings. We only need to find several such pairs in which we are the most confident. To select the algorithm that is the best for this task, I’ve created a custom metric:

It uses the provided estimator to predict probabilities of each pair belonging to class 1 (having 6 zero crossings in trading period), selects 10 pairs with the highest probabilities and calculates the fraction of those pairs that actually belong to class 1 (actually have more than 6 zero crossings in trading period).

Now we need to binarize the outcome variable and we are ready to start.

I’ll use the following function to automate the model checking process.

First I will test several ML algorithms without any tuning. I will only use the class_weight attribute where it is applicable, because we have a lot more samples in class 0 than in class 1. This is what I get:

It is basically the same as choosing pairs at random. Maybe a little bit better. Now I will try to apply Box-Cox power transformation to the data to remove the skewness we saw on the plots.

Results are very similar, only the performance of SVM improved. Let’s try removing unnecessary features with SelectKBest function.

No real improvement in average score, but we see that the standard deviation for Logistic Regression and Linear Discriminant Analysis decreases. Now I will try several ensemble models.

ExtraTreesClassifier seems to be the best here. I will try to improve its performance by using grid search to tune its parameters.

It seems that we were able to achieve a little improvement. Let’s try to tune a couple of other models. For tuning logistic regression model I will use LogisticRegressionCV class included in sklearn.

No improvement for logistic regression model. Next I’ll try tuning SVC.

No improvement on SVC and very small improvement on KNN. Overall we have only one algorithm that achieved more than 50% accuracy score during cross validation. Let’s try to train it and see how it works on unseen data.

Interesting. We were able to achieve 70% accuracy on both in-sample and out-of-sample tests. Let’s look at top pairs.

What is strange here is that almost all of the pairs contain TMP stock in them. This is obviously not good for diversification. Also notice that most of those pairs have very small hedge ratio, probably we should limit its range even more. Let’s try to select top 5 predicted pairs all containing different stocks.

The results are not very impressive, but I believe they are better than what we got in the previous article. Now at least most of the pairs stay within their 2-SD bands during the trading period.


After experimenting with different machine learning models it became clear that selecting pairs for pairs trading is not an easy task. All the tested algorithms failed in making good predictions, with the best models having accuracy a little better than 50% (which is still an improvement compared to baseline of 30%). I believe that from this we can conclude that none of the well-known simple heuristics (such as selecting pairs with the smallest Euclidean distance) could provide better results.

Possible improvements:

  • Using anomaly detection algorithms for classification
  • Trying some other feature transformations or feature engineering
  • Trying even longer formation period
  • Using other dependent variable (e.g., Euclidean distance or Pearson correlation coefficient)
  • Try classifying pairs that will satisfy all the criteria for potential pairs (CADF p-value < 0.01, Hurst exponent <0.5, etc), but during trading period. (There are only 33 out of 3471 such pairs in our training data, so maybe here anomaly detection could work).

Jupyter notebook with source code is available here.

Note: if you want to try to run this code on your laptop, you might want to use smaller stock universe; otherwise it might take a very long time (especially the part where we select potential pairs). You can download the output of select_pairs function here: pairs_trainpairs_test.

If you have any questions, suggestions or corrections please post them in the comments. Thanks for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *