Machine learning for spam classification on Stack Exchange (Industry project with Charcoal)

Maha Mahfouz A Alghamdi


Supervised by Padraig Corcoran; Moderated by Carl Jones

Charcoal detects spam on the Stack Exchange network, including Stack Overflow. We store records of every post we detect for later analysis, and have tried multiple approaches to machine learning-based classification, all unsuccessfully.

A successful approach will likely involve some time spent learning the problem domain and some time analysing existing detection methods and results, followed by designing a highly configurable, tweakable, and accurate machine learning solution involving multiple methods.

The preferred language is Python although library/requirements choice is unrestricted.

This project may require a Collaborative Agreement to be signed to protect sensitive data and intellectual property etc. If the project does require such an agreement, then it must be signed before the project begins, otherwise, you will be assigned a different project. If you have questions about this, please contact the supervisor.

Final Report (24/10/2021) [Zip Archive]

Publication Form