To access the CryptoCompare public API in Python, we can use the following Python wrapper available on GitHub: cryCompare.
With the coinList() function we can fetch all the available cryptocurrencies (about 1450).
With the histoDay() function we can fetch the historical data (OHLC prices and volumes) for a given pair. We keep only coins which have a non-trivial history (about 1350).
We store all info in a dataframe with 2-level columns: the first level contains the coin names, the second one, the OHLC prices.
Since many coins are quite recent, many have relatively short time series of historical data.
We sort them by the decreasing length of their time series.
Historical data for Bitcoin
BTC (Bitcoin) has the longest one, as expected.
For the following, we will only consider the 300 longest time series.
5 rows × 1200 columns
All these 300 time series have at least 1000 days of observed prices.
We will only consider these days for the correlation study.
Below, we compute their daily log-returns.
Notice below that the scale is pretty huge compared to other financial assets (which are usually contained in a (-0.15,0.15) range, with some tails valued at ~2 or 3).
Now, we compute a correlation/distance matrix between all these coins.
Notice that we consider here the OHLC representation, and thus we have to compute a correlation between random vectors, and not random variables (which is usually done by considering only the ‘close’ price for example).
The distance correlation is a relevant measure of statistical dependence for that purpose.
We apply it between the 300x299/2 = 44850 pairs in parallel using the joblib library.
Then, using the dendrogram obtained from the Ward hierarchical clustering method, we can sort the coins so that their correlation/distance matrix is more readable.
We can observe that some coins do cluster together as they are correlated and uncorrelated to the rest of the coins in a similar way.
For example, we obtain the following clusters (if we ask for 30 groups).
We can use these clusters to average the values of the correlation-distances inside and in-between the clusters.
We obtain the following filtered matrix:
For example, below are the ‘close’ log-prices of one rather strong cluster: