[PDF]

Sentiment analysis of social media text and it's relationship with the price of cyrpto currencies over time


Philip Byrne

07/10/2021

Supervised by Padraig Corcoran; Moderated by Martin J Chorley

I have identified that current predictive models use traditional media and historical performance as their primary data source for predicting asset values such as stock market valuations. I have determined that existing predictive asset valuation models commonly overlook raw text data from social media platforms like Twitter, Reddit, and LinkedIn. It can be argued that cryptocurrency is largely misunderstood by traditional media sources. Based on this argument, I would like to explore whether text analytics of social media sentiment can be an effective lead indicator of future price for cryptocurrencies.I hope to use a distributed cloud System architecture based on Apache Spark and Hadoop, and process it in an NLP pipeline using GATE Cloud and APACHE openNLP and send the result to a centralized PostgreSQL database. Changes in sentiment over time (both positive and negative) of various topics will be measured. The sentiment movements within each topic will be compared to the impact of each on the price of cryptocurrencies. I will classify and model each social media source and test the sentiment of each topic classified. Automated Python scripts will analyse new data and use NLP techniques such as, Bag of Words, Markov Chains, tf – idf, N-gram tracking keywords. I believe there is a need for this project since new cryptocurrencies may need to be analysed differently to the existing stock market, which is more mature and uses methods that cannot predict valuations of Cryptocurrencies. I believe that there is scope to investigate whether there a bias-neutral method of understanding cryptocurrencies as they become a larger part of investment portfolios. Given that the demographics of cryptocurrency investors are different from traditional stock market investors, I would like to examine whether or not text analytics are more effective.


Final Report (07/10/2021) [Zip Archive]

Publication Form