Data Analysis For Cryptocurrency

Data analysis for cryptocurrency

You may have seen my previous post that tried to predict bitcoin and ethereum prices with deep learning.

Data analysis for cryptocurrency

To summarise, there was alot of hype but it wasn’t very useful in practice (I’m referring to the model, of course). To improve the model, we have two options: carefully design an intricately more sophisticated model (i.e. throw shit tons more layers in there) or identify more informative data sources that can be fed into the model. While it’s tempting to focus on the former, the garbage-in-garbage-out principle remains.

With that in mind, I created a new Python package called cryptory.

Crypto Mining Group BitFury to Investigate Data Analysis with AI

Not to be confused with the obscure Go repo (damn you, mtamer) or that bitcoin scam (you try to come up with a crypto package name that isn’t associated with some scam), it integrates various packages and protocols so that you can get historical crypto (just daily… for now) and wider economic/social data in one place.

Rather than making more crypto based jokes, I should probably just explain the package.

As always, the full code for this post can found on my GitHub account.

Installation

is available on PyPi and GitHub, so installing it is as easy as running in your command line/shell.

It relies on pandas, numpy, BeautifulSoup and pytrends, but, if necesssary, these packages should be automatically installed alongisde cryptory.

The next step is to load the package into the working environment.

Specifically, we’ll import the class.

Assuming that returned no errors, you’re now ready to starting pulling some data. But before we do that, it’s worth mentioning that you can retrieve information about each method by running the function.

We’ll now create our own cryptory object, which we’ll call .

You need to define the start date of the data you want to retrieve, while there’s also some optional arguments. For example, you can set the end date, otherwise it defaults to the current date- see for more information).

Cryptocurrency Prices

We’ll start by getting some historical bitcoin prices (starting from 1st Jan 2017).

has a few options for this type of data, which I will now demonstrate.

dateopenhighlowclosevolumemarket cap
02018-02-108720.089122.558295.478621.907780960000146981000000
12018-02-098271.848736.987884.718736.986784820000139412000000
22018-02-087637.868558.777637.868265.599346750000128714000000
........................
4032017-01-031021.601044.081021.601043.8418516800016426600000
4042017-01-02998.621031.39996.701021.7522218500016055100000
4052017-01-01963.661003.08958.70998.3314777500015491200000
datebtc_price
02018-02-108691.000
12018-02-098300.000
22018-02-088256.000
.........
4032017-01-031017.000
4042017-01-021010.000
4052017-01-01970.988

Those cells illustrate how to pull bitcoin prices from coinmarketcap and bitinfocharts.

Ipo in month of september 2020

The discrepancy in prices returned by each can be explained by their different approaches to calculate daily prices (e.g. bitinfocharts represents the average prices across that day). For that reason, I wouldn’t recommend combining different price sources.

You also pull non-price specific data with e.g.

Veeva systems ipo date

transactions fees. See for more information.

dateeth_transactionfees
02018-02-100.78300
12018-02-090.74000
22018-02-080.78300
.........
4032017-01-030.00773
4042017-01-020.00580
4052017-01-010.00537

You may have noticed that each method returns a pandas dataframe.

In fact, all methods return a pandas dataframe.

Contract for differences investopedia

This is convenient, as it allows you to slice and dice the output using common pandas techniques. For example, we can easily merge two calls to combine daily bitcoin and ethereum prices.

datebtc_priceeth_price
02018-02-108691.000871.238
12018-02-098300.000832.564
22018-02-088256.000814.922
............
4032017-01-031017.0008.811
4042017-01-021010.0008.182
4052017-01-01970.9888.233

One further source of crypto prices is offered by , which pulls data from the public poloniex API.

For example, we can retrieve the BTC/ETH exchange rate.

datecloseopenhighlowweightedAveragequoteVolumevolume
02018-02-100.0997000.1009610.1013080.0987910.1001942.160824e+042165.006520
12018-02-090.1011730.0988980.1016030.0986820.1004882.393343e+042405.019824
22018-02-080.0988960.0992240.1011960.0962950.0981942.250954e+042210.293015
...........................
4032017-01-030.0092800.0082180.0097500.0080330.0090841.376059e+0612499.794908
4042017-01-020.0082200.0081990.0084340.0078230.0081016.372636e+055162.784640
4052017-01-010.0082000.0083350.0089310.0080010.0084717.046517e+055968.975870

We’re now in a position to perform some basic analysis of cryptocurrencies prices.

Of course, that graph is meaningless.

You can’t just compare the price for single units of each coin. You need to consider the total supply and the market cap. It’s like saying the dollar is undervalued compared to the Japanese Yen.

Data analysis for cryptocurrency

But I probably shouldn’t worry. It’s not as if people are buying cryptos based on them being superficially cheap. More relevant here is the relative change in price since the start of 2017, which we can plot quite easily with a little pandas magic (pct_change).

Those coins are provided on bitinfocharts and they tend to represent older legacy coins.

For example, the coin from this list that performed best over 2017 was Reddcoin. It started 2017 with a market cap of less than 1 million dollars, but finished it with a value of around $250m, reaching a peak of over 750m in early Jan 2018. You’ll notice that each coin shows the same general behaviour- a sustained rise between March and June, followed by another spike in December and a noticeable sell-off in Jan 2018.

With a little help from pandas, we can produce a crypto price correlation plot (use the dropdown menu to switch between Pearson and Spearman correlation).

There’s nothing too surprising (or novel) here.

RECOMMENDED

It’s well known that cryptos are heavily correlated- they tend to spike and crash collectively. There’s a few reasons for this: Most importantly, the vast majority of coins can only be exchanged with the few big coins (e.g. btc and eth).

Data Driven #5: Blockchain and Big Data

As they are priced relative to these big coins, a change in btc or eth will also change the value of those smaller coins. Secondly, it’s not like the stock market. Ethereum and Bitcoin are not as different as, say, Facebook and General Motors.

Data analysis for cryptocurrency

While stock prices are linked to hitting financial targets (i.e. quarterly earnings reports) and wider macroeconomic factors, most cryptos (maybe all) are currently powered by hope and aspirations (well, hype and speculation) around blockchain technology.

Data analysis for cryptocurrency

That’s not to say coins can’t occasionally buck the market e.g. ripple (xrp) in early December. However, overperformance is often followed by market underperformance (e.g. ripple in January 2018).

I’ll admit nothing I’ve presented so far is particularly ground breaking. You could get similar data from the Quandl api (aside: I intend to integrate quandl API calls into ). The real benefit of comes when you want to combine crypto prices with other data sources.

Reddit Metrics

If you’re familiar with cryptos, you’re very likely to be aware of their associated reddit pages.

It’s where crypto investors come to discuss the merits of different blockchain implementations, dissect the day’s main talking points and post amusing gifs- okay, it’s mostly just GIFs. With you can combine reddit metrics (total number of subscribers, new subscribers, rank -literally scraped from the redditmetrics website) and other crypto data.

Let’s take a look at iota and eos; two coins that emerged in June 2017 and experienced strong growth towards the end of 2017.

Their corresponding subreddits are r/iota and r/eos, respectively.

datesubscriber_growth
02018-02-10150
12018-02-09161
22018-02-08127
.........
4042017-01-030
4052017-01-020
4062017-01-010

Now we can investigate the relationship between price and subreddit growth.

Visually speaking, there’s clearly some correlation between price and subreddit member growth (the y-axis was normalised using the conventional min-max scaling).

While the Spearman rank correlation is similarly high for both coins, the Pearson correlation coefficient is significantly stronger for iota, highlighting the importance of not relying on one single measure. At the time of writing, iota and eos both had a marketcap of about $5bn (11th and 9th overall), though the number of subscribers to the iota subreddit was over 3 times more than the eos subreddit (105k and 30k, respectively).

Building a Full-Text Search App Using Docker and Elasticsearch

While this doesn’t establish whether the relationship between price and reddit is predictive or reactive, it does suggest that reddit metrics could be useful model features for some coins.

Google Trends

You’ll notice an almost simultaneous spike in suscribers to the iota and eos subreddits in late November and early December. This was part of a wider crypto trend, where most coins experienced unprecendented gains.

Leading the charge was Bitcoin, which tripled in price between November 15th and December 15th. As the most well known crypto to nocoiners, Bitcoin (and the wider blockchain industry) received considerable mainstream attention during this bull run.

Presumably, this attracted quite alot of new crypto investors (i.e gamblers), which propelled the price even higher. Well, what’s the first thing you’re gonna do after reading an article about this fancy futuristic blockchain that’s making people rich?. You’d google bitcoin, ethereum and obviously bitconnect.

With , you can easily combine conventional crypto metrics with Google Trends data.

Data analysis for cryptocurrency

You just need to decide the terms you want to search. It’s basically a small wrapper on top of the pytrends package. If you’ve used Google Trends before, you’ll be aware that you can only retrieve daily scores for max 90 day periods. The method stitches together overlapping searches, so that you can pull daily scores going back years.

Bitcoin and Crypto Technical Analysis For beginners

It’s probably best to illustrate it with a few examples.

datebitcoin
02018-02-0922.000000
12018-02-0825.000000
22018-02-0730.000000
.........
4022017-01-033.974689
4032017-01-024.377918
4042017-01-012.707397

Now we can investigate the relationship between crypto price and google search popularity.

As before, it’s visually obvious and statisically clear that there’s a strong correlation between google searches and coin prices.

Again, this a well known observation (here, here and here). What’s not so apparent is whether google search drives or follows the price. That chicken and egg question question will be addressed in my next deep learning post.

A few words on Verge (xvg): eccentric (i.e. crazy) crypto visionary John McAfeerecommended (i.e. shilled) the unheralded Verge to his twitter followers (i.e. fools), which triggered a huge surge in its price. As is usually the case with pump and dumps, the pump (from which McAfee himself potentially profitted) was followed by the dump.

The sorry story is retold in both the price and google search popularity. Unlike bitcoin and ethereum though, you’d need to consider in your analysis that verge is also a common search term for popular online technology news site The Verge (tron would be a similar case).

Anyway, back to , you can supply more than one keyword at a time, allowing you to visualise the relative popularity of different terms.

Protecting Against Hacks

Let’s go back to the early days and compare the historical popularity of Kim Kardashian and Bitcoin since 2013.

According to Google Trends, bitcoin became a more popular search term in June 2017 (a sure sign of a bubble if ever there was one- just realised this isn’t a unique insight either).

That said, Bitcoin has never reached the heights of Kim Kardashian on the 13th November 2014 (obviously, the day Kim Kardashian broke the internet). The graph shows daily values, but you’ll notice that it quite closely matches what you’d get for the same weekly search on the Google Trends website.

While social metrics like reddit and google popularity can be powerful tools to study cryptocurrency prices, you may also want to incorporate data related to finance and the wider global economy.

Stock Market Prices

With their market caps and closing prices, cryptocurrencies somewhat resemble traditional company stocks.

Of course, the major difference is that you couldn’t possibly pay for a lambo by investing in the stock market. Still, looking at the stock market may provide clues as to how the general economy is performing, or even how specific industries are responding to the blockchain revolution.

includes a method, which scrapes yahoo finance and returns historical daily data.

Just note that you’ll need to find the relevant company/index code on the yahoo finance website.

dateadjcloseclosehighlowopenvolume
02018-02-1024190.90039124190.90039124382.14062523360.28906223992.669922735030000.0
12018-02-0924190.90039124190.90039124382.14062523360.28906223992.669922735030000.0
22018-02-0823860.46093823860.46093824903.67968823849.23046924902.300781657500000.0
........................
4032017-01-0319881.75976619881.75976619938.52929719775.92968819872.859375339180000.0
4042017-01-02NaNNaNNaNNaNNaNNaN
4052017-01-01NaNNaNNaNNaNNaNNaN

You may notice the previous closing prices are carried over on days the stock market is closed (e.g.

weekends). You can choose to turn off this feature when you initialise your cryptory class (see ).

With a little help from pandas, we can visualise the performance of bitcoin relative to some specific stocks and indices.

This graph shows the return you would have received if you had invested on January 3rd.

As Bitcoin went up the most (>10x returns), it was objectively the best investment. While the inclusion of some names is hopefully intuitive enough, AMD and NVIDIA (and Intel to some extent) are special cases, as these companies produce the graphics cards that underpin the hugely energy intensive (i.e.

wasteful) process of crypto mining.

RECOMMENDED

Kodak (not to be confused with the pre 2012 bankruptcy Kodak) made the list, as they announced their intention in early Jan 2018 to create their own “photo-centric cryptocurrency” (yes, that’s what caused that blip).

As before, with a little bit of pandas work, you can create a bitcoin stock market correlation plot.

The highest correlation recorded (0.75) is between Google and Nasdaq, which is not surprising, as the former is large component of the latter.

As for Bitcoin, it was most correlated with Google (0.12), but its relationship with the stock market was generally quite weak.

Commodity Prices

While Bitcoin was originally envisioned as alternative system of payments, high transaction fees and rising value has discouraged its use as a legitimate currency.

This has meant that Bitcoin and its successors have morphed into an alternative store of value- a sort of easily lost internet gold. So, it may be interesting to investigate the relationship between Bitcoin and the more traditional stores of value.

includes a method that retrieves historical daily prices of various precious metals.

dategold_amgold_pmsilverplatinum_amplatinum_pmpalladium_ampalladium_pm
02018-02-101316.051314.1016.345972.0969.0970.0969.0
12018-02-091316.051314.1016.345972.0969.0970.0969.0
22018-02-081311.051315.4516.345974.0975.0990.0985.0
...........................
4032017-01-031148.651151.0015.950906.0929.0684.0706.0
4042017-01-02NaNNaNNaNNaNNaNNaNNaN
4052017-01-01NaNNaNNaNNaNNaNNaNNaN

Again, we can easily plot the change in commodity over 2017 and 2018.

Look at silly old gold appreciating slowly over 2017 and 2018, thus representing a stable store of wealth.

Data analysis for cryptocurrency

As before, we can plot a price correlation matrix.

Unsurprisingly, the various precious metals exhibit significant correlation, while bitcoin value appears completely unconnected. I suppose negative correlation could have provided evidence that people are moving away from traditional stores of value, but there’s little evidence to support this theory.

Foreign Exchange Rates

One of the motivations behind Bitcoin was to create a currency that wasn’t controlled by any central authority.

There could be no quantitative easing- when the US Central Bank devalued the dollar by essentially printing trillions of new dollars to prop up the faltering economy after the 2007 financial crisis.

As such, there may be a relationship between USD exchange rate (which would be devalued by such policies) and money moving into cryptocurrencies.

includes a method that retrieves historical daily exchange rate between particular currency pairs.

dateexch_rate
02018-02-101.2273
12018-02-091.2273
22018-02-081.2252
.........
4032017-01-031.0385
4042017-01-02NaN
4052017-01-01NaN

As you can see, the USD has lost ground to the Euro over the last year.

We can easily add a few more USD exchange rates (spoiler alert:the USD has depreciated relative to most major currencies). As the results are similar to the precious metals, that code can be found in the Jupyter notebook.

Oil Prices

Oil prices are strongly affected by the strength of the global economy (e.g.

demand in China) and geopolitical instability (e.g. Middle East, Venezuela).

What Bitcoin Traders Should Know: Fundamental Analysis

Of course, there’s other factors at play (shale, moves towards renewables, etc.), but you might want to have oil prices in your crypto price model in order to include these forces.

includes a method that retrieves historical daily oil (London Brent Crude) prices.

dateoil_price
02018-02-1064.18
12018-02-0964.18
22018-02-0864.18
.........
4032017-01-0352.36
4042017-01-02NaN
4052017-01-01NaN

As you can see, oil is up about 20% since the start of 2017.

Of course, you can plot the price over a longer time period.

Future

So what’s the future of cryptos? Moon, obviously! As for the future of , it already includes numerous tools that could improve price models (particularly, reddit and google trend metrics).

CryptoCurry

But it’s certainly lacking features that would take it to the moon:

  • twitter statistics (specifically John McAffee’s!!!)
  • media analysis (number of mainstream articles, sentiment, etc.- example)
  • more Asian-centric data sources (Japan and South Korea are said to account for 40% and 20% of global bitcoin volume, respectively)
  • more financial/crypto data (integrate Quandl api)

In my next post, I’ll use to (hopefully) improve the previous LSTM crypto price prediction model.

While you wait for that, you can perform your own cryptocurrency analysis with the accompanying Jupyter notebook. Thanks for reading!