[PDF]

Sentiment Analysis of Financial News Headlines with Market Comparison


Harvey Allen

11/05/2022

Supervised by Irena Spasic; Moderated by Martin J Chorley

This project is based on the following Kaggle dataset:

https://www.kaggle.com/notlucasp/financial-news-headlines

Scraped from CNBC, the Guardian, and Reuters official websites, the headlines in these datasets reflects the overview of the U.S. economy and stock market every day for the past year to 2 years.

Data scraped from CNBC contains the headlines, last updated date, and the preview text of articles from the end of December 2017 to July 19th, 2020. Data scraped from the Guardian Business contains the headlines and last updated date of articles from the end of December 2017 to July 19th, 2020 since the Guardian Business does not offer preview text. Data scraped from Reuters contains the headlines, last updated date, and the preview text of articles from the end of March 2018 to July 19th, 2020. Inspiration

This project will investigate a hypothesis that the sentiment of financial news headlines reflects and directs the performance of the US stock market. The main aim will be to implement sentiment analysis that performs well on this dataset and then investigate the correlation of the sentiment against the stock market's gains/losses. Named entity recognition will be used to remove the noise from the dataset, i.e. reduce the overfitting towards specific companies. In addition, the project will try to detect specific aspects of the prevailing sentiment to improve the understanding the drivers behind the gains/losses.


Initial Plan (07/02/2022) [Zip Archive]

Final Report (11/05/2022) [Zip Archive]

Publication Form