Understanding Impact of Twitter Feed on Bitcoin Price and Trading Patterns

Comments · 136 Views

‘‘Cryptocurrency trading was one of the most exciting jobs of 2017’’. High return on investment has attracted many people towards this crypto market. Let's see if we are able to improve Bitcoin price prediction using Twitter sentiment


In 2017, the cryptocurrencies market value rose from 19 Billion USD to 741 Billion USD. Bitcoin had a 59.87 % market cap where Bitcoin price increased 2000 %  An exponential increase in market price had attracted everyone to invest and make money


Bitcoin Price and Volume between Jan 01, 2017 to Feb 28,2018

 Cryptocurrencies are bought and traded on platforms similarity to stock market. These markets are not yet regulated by any country.Anyone can buy and sell them across the world at any time. Government announcements, regulations, alliances, and promotions from famous personalities play a key role in deciding the price of cryptocurrencies. If a statement is made in the USA, the news propagates
to other countries like India, China via the internet, Social media websites, etc.

Problem Statement

We would like to answer the following in this article 

  • If using state of art context-based model gives better classification compared to statistical models for Twitter feed
  • Build a model to predict Bitcoin price
  • Check if there exists any relation between bitcoin price and twitter feed
  • If using market sentiment through twitter we can improve the bitcoin price prediction

Proposed Model

We approach the above problem statement in the following way:

  • Build a classifier for sentiment  classification 
    • Compare Bag of words model and BERT
  • Build a tweet extraction pipeline and use best model from previous experiment for classification 
  • Build a predictive model for predicting the price of bitcoin

Sentiment Model

Sentiment Analysis Pipeline for Twitter Tweets


As we are using twitter feed for aggregating Bitcoin opinion for a day, the tweets are posted throughout the day, which needs to be collected and processed each day. Using Twitter API, we first extract Bitcoin-related tweets from the API using ‘‘#BTC’’, ‘‘#BITCOIN’’ hashtags, with the English language as the filter.

Once the tweets are collected, they are pre-processed using pre-processing techniques, which will pass through a classification algorithm for classifying. If the individual tweet is positive 1 or negative 0, as shown in Figure.

Once all the tweets are classified, we need to aggregate that day’s overall sentiment, i.e., if it was a positive day for Bitcoin stating people were supporting bitcoin or negative day stating
people were not supported on that day.


Predictive Model


Prophet Model


We use the Prophet model to capture daily, weekly trends, and perform well for daily predictions; hence we choose this model to set up a base model to predict the bitcoin price. We are using
the sliding window technique, as shown in Figure.

We take a sliding window with range n, then the model trains on BTC close price from day_0 to day_n  and predicts the value on day_n+1, then we retrain the model from day_1 to day_n+1 and predicts day_n+2 and so-on. For example, n = 60 where n is the number of days, we predict the bitcoin price of Day 61. Then we retrain the model from Day 2 to Day 61, maintaining n = 60 window size to predict the bitcoin price of Day 62.

We keep retraining the model again so that the model predicts better based on changing trends. We use the R2 score on this test set as our evaluation metric. Once the base model is running, the aim to see if we can improve the R2 score of the test set by adding a daily sentiment and additional features.


Final Merged Model


Final Proposed pipeline


The final proposed model is as shown in the Figure, where the left side of the figure is the workflow of the predictive model which gives next day predicted price, the center workflow is the sentimental analysis workflow which gives current day sentiment

The SVM model has three inputs - next day price prediction from the predictive model, current-day aggregated twitter sentiment and additional features. These inputs, when passing through the SVM model, give us the adjusted bitcoin price, which is expected to be better than our predictive analysis base model.


Dataset Description

 We use different datasets for different tasks in the project 

  • Dataset 1 - Stanford tweets for training classifiers , labelled [Kaggle]
  • Dataset 2 - Daily Tweets , unlabelled [Prepared by us]
  • Dataset 3 - Additional Parameters [Prepared by us]


Dataset 1 

We download the dataset from Kaggle , which is prepared by Stanford University. It consists of 1.6 million tweets equally distributed into positive and negative tweets. Each tweet consists of six fields- sentiment of the tweet, tweet ID, tweet date, query, userID, and text. Out of these six features, we use only sentiment and text for our experiments.

Using the data preparation technique, we drop all other features and keep only text and sentiment. Sentiment feature has two binary values 0 for negative, and 1 for positive, the text feature has tweet written by the user.


Dataset 2 Daily Tweets , unlabelled

Usually, we can extract bitcoin tweets from twitter by using Twitter API, but the timeline for analysis is from 10-01-2017 to 03-01-2018. Twitter has restrictions over mining historical tweets. Hence, we use ‘‘Get Old Tweets Programmatically’’ an API to get old tweets.

We use 3 AWS EC2 instances to extract tweets of different months parallelly from 10-01-2017 to 03-01-2018, collecting 5000 tweets every day by running automated scripts that call the API every 5 minutes and downloads the tweets. We initially store the tweets on the local EC2 file system, then we use scpcalls and download them to our local desktop and merge all the files.

A total of 467361 are collected; we preprocess these tweets by removing tags, URL. Each tweet consists of twitter handle, timestamp, tweet. Using the handle, if duplicates are present, we remove them. We keep the date and cleaned tweets for our experiments.

Dataset 3 Additional Parameters

We collect this data from Jan 01, 2017 to Feb 28, 2018 . The data set consists of 8 fields Timestamp, Tweet Volume, Average Transaction Fee, Average Transaction Value, BTC close price, Volume, Bullish, Bearish.


Experiments and Inferences


We see that although being a statistical machine learning model, Logistic Regression with TF-IDF gave the best test accuracy of 85.5 percent with a runtime of 37 mins for classifying the tweets.

In comparison, BERT, which is one of the best classification algorithms, takes significantly long training time (51 hrs) and provides an accuracy of just 71 percent. The Machine used to train BERT was substantially more powerful in terms of CPU computation and memory than the one used for Figure 31 experiments.

The reason for such poor performance is BERT is pre-trained using Wikipedia articles, which has well-written articles, i.e., which follows grammar rules. In comparison, the tweets are usually written using locally spoken slang like ‘‘wanna’’, ‘‘gonna’’, ‘‘YOLO’’, which generally doesn’t follow strong grammar rules making the classification problem harder.


Now dataset 3 is taken, sentiment features are extracted from them using VADER and spacy NLP. Each pre-processed tweet passes through VADER and spacy functions, and their rate of positively, negativing are derived from them.

We aggregate all tweets during a day for this data and plot them to understand how they change over the duration. As we see, negative sentiment dominates positive sentiment in this prepared dataset. Now we take Bullish and Bearish features from dataset 2 and see how they change during the same period. We see there are a few days when Bearish sentiment is high and a few days when Bullish sentiment is high.

When we are comparing these two charts, in Figure 1, we don’t observe any significant spikes or changes between two consecutive days, whereas there are many spikes and changes in Figure 2. Hence, Figure 2 reflects better emotions of twitter user’s sentiment during this period.


When tweet volume increased, and the sentiment trend was bullish, the bitcoin price increased. When sentiment trend was bearish, and tweet volume was low, the bitcoin price decreased. Hence, there sees to be some relation and using our proposed model we try to capture this relation.


Experiments - Building a Predictive Model

We first set up a base model using dataset 2 closing price.We use the closing price of the previous day to predict the price of the next day. We test different N values varying from almost one year to one week to understand its impact on the R2 score.

When the window is 394 days, the training R2 score was -19.7 as we decreased the window size to 274, the R2 score improved to 0.22. When we reduce the window size to 7 days, we notice the best training R2 score of 0.79.


We see the training and test results of our baseline model in above Figures. The red line is the actual price, and the blue line is our model’s predicted price, as we can see in Figure for the first 150 predictions, our model predicts well since there are no exponential spikes. Later, when there were exponential rises,our model was able to catch the changing trend, but it failed when there was an
exponential fall.

We can see in Figure, that our model performs poorly on the Test set, it achieved an R2 score of -19.93. This score becomes our base model score that we try to improve by adding the sentiment, and additional features from dataset 2.


Proposed Model Experiments 


We follow the recursive additive approach,where we first choose a combination of features, then using these features we train the SVM model and predict for test data. Using gridsearchCV, we tune the SVM
hyperparameters to optimize and get the best R2 score.W
e try RBF and sigmoid SVM kernels, try different values of C, gamma, and epsilon values to find the optimal values.

Sentiment and predicted prices are fixed features for all models; we keep adding additional features to them. As we see in Figure,

  1. When we use only one extra feature, Average Transaction Fee, the best R2
    score obtained is -7.67.
  2. When we choose two features, Average Transaction Fee and Tweet Volume,
    the best R2 score obtained is -2.9.
  3. When we choose three additional features, Average Transaction Fee, Twitter
    Volume, and Average Transaction Value, we obtained the best R2 score of -1.938

From Figure, adding the sentiment and additional features improved our R2 score. The R2 score is approximately 90 percent lesser than our base model R2 score. It reduced from -19.8 is -1.938.


We wanted to do the following tasks:

  • Test if using state of art context based model gives better classification compared to statistical models for Twitter feed
    • Statistical model performed well, whereas BERT took comparatively larger training times with less accuracy.
  • Build a model to predict Bitcoin price
    • Using Facebook’s prophet model a base model gave us an R2 score of -19.8 for a sliding window of N = 90.
  • Check if there exists any relation between bitcoin price and twitter feed
    • We saw inferences showing correlation 
  • Test if using market sentiment through twitter we can improve the bitcoin price prediction
    • A novel approach to predict the bitcoin prices using information from both predictive analysis and sentiment analysis. 
    • We improved the R2 score to -1.938 R2