financialnoob.me

Blog about quantitative finance

Pairs trading with kagi indicator

In this article I will describe how to implement and backtest a pairs trading strategy using kagi indicator. It is based on a paper ‘Pairs trading based on statistical variability of the spread process’ (Bogomolov, 2013)It is an approach to pairs trading which does not rely on a long-term mean of the spread process. Kagi indicator is quite old — according to wikipedia it was developed in Japan in 1870s and was originally used for tracking rice prices. Renko is another similar indicator also described in the paper, but no strategy is based on it, so I am going to omit it here.


First let’s try to understand how kagi chart works. Assume we have have time series P (spread). Kagi indicator consists of two sequences: tau_a and tau_bTau_a contains time moments (indexes) when the process P has local minimum or maximum. Tau_b contains time moments when these local extrema are recognized (when the value of P is more than H away from the last min\max). Basically it tracks local minimums and maximums of process P given a specific threshold H.

Algorithm for kagi construction works as follows:

  • We start with finding tau_b[0] which is defined in the paper as:

Definition of tau_b[0]

  • We process time series P to find the first point such that the difference between minimum and maximum values of P (up to this point) is bigger than H.
  • To find tau_a[0] we scan time series P up to the point tau_b[0] and select the first point such that the distance between this point and the value of Pat tau_b[0] is bigger than H. Formal definition from the paper is shown below.

Definition of tau_a[0]

  • Then we need to determine whether the point tau_a[0] is local minimum or local maximum. To do this we need to check if the value of P at time (index) tau_b[0] is bigger than the value of P at time (index) tau_a[0]. If it is, then process P is moving up after time tau_a[0] and therefore point tau_a[0] is a local minimum. If the value of P at time tau_b[0] is less than value of P at time tau_a[0], then tau_a[0] is a local maximum. Formula for this is shown below. It is equal to 1 if the point is local maximum and to -1 if the point is local minimum.

Type of critical point at time tau_a[0]

  • After we determine the type of tau_a[0] we know the types of all critical points after it because they are alternating between local minimum and local maximum. For instance if tau_a[0] is a local minimum, then we know that all odd points after it (tau_a[1],tau_a[3],…) are local maximums and all even points (tau_a[2],tau_a[4],…) are local minimums.
  • For local minimums we use the following definitions:

Definition of tau_a and tau_b for local minimums

  • For local maximums:

Definition of tau_a and tau_b for local maximums

  • We just alternate the procedures described above until the end of the series.

Formulas and explanations I provided above might seem a little bit difficult, but I hope it’ll become more clear when you see it implemented in python.


First let’s experiment on some synthetic data. Code for generating and plotting it is shown below.

Plot of synthetic spread

On the plot above we can see many trading opportunities — spread price oscillates up and down. But if we use traditional pairs trading strategies based on the assumption that the spread reverts to some long-term mean, we won’t be able to exploit most of those trading opportunities. Spread oscillates around long-term mean only a few times in the middle of the plot.

Kagi construction doesn’t rely on reversion to the long-term mean. Instead it helps us detect local minimums and maximums of a given time series. Code for creating kagi construction is provided below. It takes time series P and parameter H as an input and returns two lists of indexes: tau_a and tau_b.

In the code above we just follow the procedure described earlier. First we find indexes tau_b[0] and tau_a[0] (lines 14–31). Then we determine the type of extremum P has at a point tau_a[0] (lines 33–39). After that we alternate between procedures for finding indexes tau_b and tau_a depending on local extremum type (lines 41–78).

Now let’s apply this function to the synthetic data generated earlier and plot the results.

Kagi indicator for synthetic spread

Let’s try to follow how kagi is constructed on the plot above. I’ve set parameter H=0.5. First step is finding tau_b[0]. It is the smallest time series index such that the distance between the minimum and maximum of time series up to that point is greater than H. You can see on the screenshot below that it happens at time index 1.

tau_b and tau_a for synthetic spread

Now we need to find tau_a[0] which is the smallest index such that the difference of the spread values between it and tau_b[0] is greater than H. Above we see that tau_a[0]=0. Let’s see if the difference is indeed greater than H.

Difference of the spread between points tau_b[0] and tau_a[0]

To determine whether the point tau_a[0] is local min or local max we just check whether the value of the spread at point tau_b[0] is greater than the value of the spread at point tau_a[0]. In our case it is local minimum, therefore the next critical point will be local maximum and we can detect it using the formulas for tau_a and tau_b in local maximum case.

We scan the spread starting at point tau_a[0] and look for a point such that the difference between the maximum of the spread up to that point and the value of the spread at that point is greater than H. We see on the screenshot above that tau_b[1]=6. We can check it and confirm that given condition is only satisfied at index i=6.

And so we proceed in the same way from one local extremum to the next until we process the whole time series. Trading is done at times tau_b when local minimum or maximum is detected.


Now we need to introduce a couple of properties of kagi construction called H-inversion and H-volatility.

  • H-inversion measures the number of times the spread changes its direction for a given value of H. The formal definition from the paper is shown below, but basically it’s just a number of points tau_b.

Definition of H-inversion

  • H-volatility measures the variability of the process. Its formal definition is shown below.

Definition of H-volatility of order p

H-volatility and H-inversion play important role in the proposed trading strategy, so we need to write a function that computes these metrics. Only H-volatility of order 1 (p=1) is used in trading strategies, so I will calculate only it.

In the function above we just implement the formulas provided in the paper for H-inversion and H-volatility.


One interesting feature of kagi constructions is that we can use two types of strategies — trend-following (momentum) strategy and contrarian (mean-reversion) strategy. The choice of strategy is based on value of H and H-volatility.

Rules for strategy selection

In the paper only contrarian strategy is tested. Here’s a quote about processes with H-volatility > 2H:

Likewise, H-volatility > 2H could be viewed as a property of a sub-martingale or a super-martingale or a process regularly switching over time from a sub-martingale to a super-martingale and back. It is unlikely that these sorts of processes exist in financial markets. Pastukhov (2005) does not provide any examples of processes for which H-volatility > 2H.


Now let’s use some real-world data and calculate the parameters described above for each pair. I will use constituents of Vanguard Small-Cap Value ETF (VBR) as my stock universe. Data from 2013–01–01 to 2013–12–31 will be used as a training period for pair selection. Data from 2014–01–01 to 2014–06–30 will be used as a test period for trading. Code for loading and preprocessing data and calculating pair parameters is shown below.

For each pair of stocks we construct the spread as a difference of log prices. Parameter H is set equal to the standard deviation of the spread. We process each pair and save the following parameters in a dataframe: H, H-volatility, H-inversion, Last extremum.

Pairs with the highest value of H-inversion are selected for trading. Let’s select top 20 pairs and check if there are any pairs with H-volatility > 2Hamong them. Results are shown on a screenshot below.

Pairs with H-volatility > 2H

Six out of top twenty pairs satisfy that condition and it seems that such processes exist and they are not that rare. So I will use both types of strategies in my backtests.


Code for pair selection is provided below. We select top 5 pairs consisting of different stocks (we should have 10 unique stocks in 5 pairs).

Selected pairs

Let’s plot the spread and kagi indicators of one of the selected pairs.

Spread and kagi indicators for pair BBBY-STFC

Trading algorithm works as follows:

  • At the first trading day we open positions based on the last extremum of the training period (whether it is min or max) and the type of strategy based on the values of H and H-volatility.
  • After that at each trading day we calculate the spread using last 252 trading days. Then we construct kagi indicators of the spread and check if new local extremum is detected. If it is detected, we adjust our positions accordingly. If it is not, the positions remain the same.
  • Parameter H for kagi construction is equal to the standard deviation of the spread during training period. It remains constant for the whole trading period.
  • Two factors that influence our positions are type of the extremum (local minimum or local maximum) and type of strategy (contrarian or trend-following). Specific rules will be clear from the source code.

Code for calculating positions dataframe is provided below.

From positions dataframe we calculate returns. Code is shown below. In the first line we divide return by 5 because our capital is equally allocated between 5 pairs and we multiply it by 2 because we can use double the amount of capital we have.

Plot of cumulative returns (Strategy 1)

We can already see from the plot that this strategy does not perform very well. Performance metrics are shown below.

Performance metrics


Now we will try to improve the strategy. Instead of using static parameter H, we will recalculate it at each time step using sliding window of 252 trading days. Also I will not open positions based on the last extremum of training period. First positions will be opened when first extremum is detected in the trading period.

Changes in the code are minor. Lines 10–12 are changed to recalculate parameter H at each time step. Line 14 calculates H-volatility using updated parameter. Line 29 sets initial positions to be 0 (instead of opening positions based on the last extremum of the training period).

Plot of cumulative returns (Strategy 2)

We can see on the plot that these minor changes significantly improve the performance of the strategy.


I’ve also tested the same two strategies but using top 20 pairs instead of top 5. I won’t describe the whole process again, you can find the source code in the Jupyter notebook provided in the end of this article. Let’s just look at performance metrics of all 4 backtests.

Performance metrics

Strategy 1 (using static parameter H) with 5 pairs has the worst return of all. Increasing the number of pairs to 20 improves its performance, but only a little bit.

Strategy 2 with 5 pairs provides the best results. It has an annual return of 87% and Sharpe ratio equal to 2.8. Increasing the number of pairs to 20 significantly decreases the annual return (from 87% to 32%), but the overall performance is still not bad.


Ideas for further improvements:

  • Try different methods of pair selection.
  • Use other methods to determine parameter H (GARCH model is suggested in the paper).
  • Try different methods of spread construction.
  • Decrease the length of the trading period.
  • Exclude pairs with H-volatility too close to 2H (because if H-volatility=2Hthen the spread is a Wiener process and it’s impossible to make profit from trading it).

Jupyter notebook with source code is available here.

If you have any questions, suggestions or corrections please post them in the comments. Thanks for reading.


References

[1] Pairs trading based on statistical variability of the spread process (Bogomolov, 2013)

[2] On Some Probabilistic-Statistical Methods in Technical Analysis (Pastukhov, 2005)

[3] https://en.wikipedia.org/wiki/Kagi_chart

[4] https://hudsonthames.org/pairs-trading-based-on-renko-and-kagi-models/

Leave a Reply

Your email address will not be published. Required fields are marked *