Relevant Features and Models in the Detection of Malicious COVID-19 Tweets

Izabele Bauzyte


Supervised by Amir Javed; Moderated by Bailin Deng

With the explosion of technology usage spurred by the COVID-19 pandemic, malicious actors worldwide have taken this opportunity to create and spread new types of coronavirus-related malware and scams, relying on social media networks such as Twitter to quicken the spread. This paper investigated a set of processed tweet features, including: named entity labels, parts-of-speech, emotion and sentiment analysis, textual attributes, and tweet account features, to determine which features are most helpful in discovering tweets containing malicious URLs. It was discovered that the most telling features were text-attribute and account features, while parts-of-speech, entity labels, and sentiment analysis proved to be less helpful. This paper also tested a small number of different models to determine which models were able to classify the malicious/benign tweets most accurately, of which the random forest classifier and a stacked meta-classifier encompassing about half a dozen other models performed best, while the SVM and multi-layer perceptron models performed the worst.

Initial Plan (04/02/2021) [Zip Archive]

Final Report (14/05/2021) [Zip Archive]

Publication Form