Review of current publicly available datasets for training AI

Abid Gafoor


Supervised by Eirini S Anthi; Moderated by Yulia Cherdantseva

This project will review the current literature within Cybersecurity and identify all the available datasets that can be used for training AIs. The main goal of the project is to categorise these datasets and determine their fitness for usage. To achieve this:

i) The project will require the design and creation of a fitness method and categorization of AI data sets. ii) The project will also investigate if these data sets are universally useful for BlueTeam AIs and RedTeam AIs? If not, what would a good BlueTeam AI dataset need and what would a good RedTeam AI dataset need?

Given the research-oriented element of the project, following the completion of this project, a desirable output is to also to produce an academic paper.

Final Report (06/01/2023) [Zip Archive]

Publication Form