financialnoob.me

Blog about quantitative finance

Introduction to copulas (Part 2)

In this article I’m going to describe several more advanced copula functions (compared to the ones described in the previous part). I will consider copulas from Archimedean family — Clayton, Gumbel, Frank and Joe copulas. I will implement them from scratch in Python and show how to sample from them and how to fit them to data.


Archimedean family is probably the most famous class of copulas. Let’s start with a general definition. I will use bivariate version, but it can be easily generalized to higher dimensions. A copula is Archimedean if it can be written in the following form:

Definition of Archimedean copula

So to define an Archimedean copula it is sufficient to provide a generator function phi. The definition above might not be very clear now, but I think it’ll be easy enough to understand after I show a couple of examples.

There are many different types of Archimedean copulas. For example Roger B. Nelsen’s book ‘An introduction to copulas’ lists 22 of them. Let’s start with Clayton copula. The generator function for it is:

Clayton copula generator function

To construct the copula function we need to know the pseudo-inverse of the generator function. We need to derive the value of as a function of phi:

Pseudo-inverse of Clayton copula generator function

Now let’s construct Clayton copula function:

Clayton copula

The plot of Clayton copula with parameter alpha=6 is presented below.

Clayton copula (alpha=6)

Now we need to solve two more problems: how to sample from a given copula and how to fit it to data. I will start with sampling.

First sampling technique is based on conditional CDF. The algorithm is quite simple:

Let’s try to apply it to Clayton copula. First we need to calculate conditional CDF:

Conditional CDF (Clayton)

Calculating derivatives can be very time consuming and prone to human errors. Let me show you a way to calculate derivative in Python using sympy library. In the example below we set variable expr equal to the expression for a Clayton copula. We then use diff function to calculate partial derivative of the copula function with respect to variable u. You can easily verify that the expression we get from sympy is identical to the one we calculated above.

Calculating derivatives in Python

Now we need to calculate the inverse of conditional CDF. I don’t know how to do it with sympy, so I’m doing it manually:

Inverse of conditional CDF (Clayton)

That’s it, we are ready to generate samples from Clayton copula. The code for it is provided below.

Clayton copula (alpha=6)

On the plot above we can verify that both marginal distributions look like standard uniform.

The method of sampling described above doesn’t always work. Sometimes we can have a conditional CDF that is not directly invertible. Luckily there is another way to sample from an Archimedean copula. It involves using Kendall distribution function, which can be easily constructed from the copula generator function. The general algorithm is presented below:

Sampling from Archimedean copula

Let’s try to generate samples from Clayton copula using this method. We can derive a formula for Kendall distribution function using sympy:

Kendall distribution function

Now we are ready to implement the second sampling method. The source code and the resulting plots are provided below. We implement functions for generator function, inverse generator function and Kendall distribution function. The rest is done according to the algorithm described above.

Clayton copula (alpha=6)

As we can see on the plots above, the results we get are similar to what we got using another method. I think this method is more universal and easier to use. Derivative of the generator function is usually easier to calculate than the derivative of the copula function. So I will use this technique for other copulas. But before we explore other copulas let’s see how to fit copula to data.

There are several methods available for copula parameter estimation. I will consider two of them: estimation based on dependence measures (such as Spearman’s rho or Kendall’s tau) and Canonical Maximum Likelihood (CML).

For some copulas there is an analytical formula for computing its parameter based on a dependence measure. So we can just calculate this dependence measure for our data and plug it into the formula to get the parameter value. A couple of examples:

Copula parameters

In the formulas above tau is Kendall rank correlation coefficient.

Let’s see how this works in practice. I will use Clayton copula we generated above to get a sample of two variables — one with normal distribution, another with exponential distribution. Then I’ll calculate Kendall’s tau for these variables, plug it into the formula above and compare with the real value of alpha I used to generate the copula.

We get alpha=6.18, which is very close to the true value of 6. The method works as expected, but there is a problem — for most copulas there is no analytical solution available. So this technique will work only for a limited number of copulas.

The other method (CML) is a little more difficult, but it is also more versatile. The idea is simple: transform variables to uniform using probability integral transform and then select the value of parameter alpha that maximizes the log-likelihood function of the copula. The most difficult part here is calculating the copula Probability Density Function (PDF), which is necessary to define log-likelihood function. But this task can be simplified using sympy. PDF of bivariate copula can be calculated from its CDF as follows:

Copula PDF from CDF

It can be easily done in Python with sympy:

Taking derivatives with sympy

Now we are ready implement CML. Look at the code below. I’m using the same variables (x and y) that we generated before.

Note that in the code above I’m defining negative log-likelihood function. The reason for this is that we are using minimize_scalar function, which minimizes a given expression. Minimizing negative log-likelihood is equal to maximizing log-likelihood. You can see the results below.

CML results

The estimated value of the parameter is close to the true value used to generate given data, so everything works as expected.

Now we have all the necessary building blocks for creating other Archimedean copulas, sampling from them and fitting them to data. Let me demonstrate it on several examples.


Gumbel copula

For Gumbel copula we have:

Gumbel copula

I’m going to generate a sample from Gumbel copula using Kendall distribution function. Code for it along with resulting plots is provided below.

Gumbel copula (alpha=5.6)

Gumbel copula is one of a few copula families for which there is an analytic formula connecting parameter alpha to Kendall’s tau. I’m using the same technique to generate some data from Gumbel copula as I used before.

Gumbel copula parameter estimation

As you can see on the picture above we get 5.58 which is very close to the true value of 5.6.

Implementing the CML method involves differentiating a CDF function, which is not so easy to do. Luckily we have sympy. Below I define Gumbel copula function and calculate its derivative using sympy.

Gumbel copula PDF

Using the above formula we can now define gumbel_pdf and gumbel_neg_likelihood functions.

Gumbel copula fit (CML)

As you can see above, CML gives a number which is close to the true value of 5.6.


Frank copula

To construct Frank copula we use the following formulas:

Frank copula

Source code for sampling from Frank copula and the resulting plots are shown below.

Frank copula (alpha=8)

To fit Frank copula to data we need to calculate its PDF, which I do using sympy.

Frank copula PDF

Note that here I use simplify function from sympy to get a simpler expression, which is easier to type when defining a PDF function.

Everything else is exactly the same as before. Define PDF and log-likelihood functions and run optimizer to search for parameter value that maximizes log-likelihood.

Frank copula fit (CML)

And again we get correct results. Let’s look at one last example.


Joe copula

Formulas:

Joe copula

Sampling:

Joe copula (alpha=5)

Fitting:

Joe copula PDF

Note that the simplified expression above doesn’t fit on my screen and therefore it is not shown in full. Check Jupiter notebook (link provided in the end of the article) to see the whole expression.

Here I had to make a little change in the pdf function because I encountered some numerical issues. Basically we need to make sure that none of the elements in arrays u1,u2 are exactly equal to 0 or 1.

Joe copula fit (CML)

As you can see above, the result we get is correct.


I hope that everything was clear and now you can implement any other Archimedean copula on your own. In the next article I will try to apply this knowledge in practice and implement a pairs trading strategy using copulas.


Jupyter notebook with source code is available here.

If you have any questions, suggestions or corrections please post them in the comments. Thanks for reading.


UPDATE 26.03.22: I have noticed that the second method of generating a sample from copula (using Kendall distribution function) did not always work as expected. I have fixed that error and updated the article and the corresponding Jupyter notebook.

Leave a Reply

Your email address will not be published. Required fields are marked *