Developing ML model to detect drive-by download on Twitter and uncovering cyber criminals tactics.

Emma Jenkins


Supervised by Amir Javed; Moderated by Dave Marshall

Stage 1 Data collection & Experimental setup You will need to identify 2-3 popular events such as Covid or trending topic to collect data from Twitter or other social platforms like Instagram. I'll let you decide on the platform. You will need to create a setup to store the data. Preprocess the data for the next stage.- identify disinformation Create an account with Virus total, multiple accounts so you can check many URL’s or you can use Cuckoo honeypot to analyse the URL [this is relatively harder – if you go this way we may need to change the analysis a bit] Stage 2 Annotation Connect to virus total API and check for malicious URL’s. Segregate malicious and benign URL Stage 3 Create a supervised machine learning model by Analyse malicious tweets to see if there are any similarity (content based features). Identify parameters that are indicators of malicious tweets (Account based or URL based). Stage 4 Create a dashboard/WebApp that will 1. Take tweets as input and based on the model developed classify if the tweet is malicious or not. This should be shown as a live bar chart. Show evidence of disinformation being used to spread malware.

Initial Plan (06/02/2023) [Zip Archive]

Final Report (12/05/2023) [Zip Archive]

Publication Form