Wales Safeguarding Repository: Document Discovery via Natural Language Processing

Sebastian Thomas


Supervised by Alun D Preece; Moderated by Jose Camacho Collados

The Wales Safeguarding Repository (WSR) is an online collection of safeguarding review reports containing valuable information about learning experiences in tackling serious incidents involving children and vulnerable adults. Research in the School of Social Sciences at Cardiff University has highlighted the need for better learning to emerge from these reviews, in order to enhance the future safeguarding practices of professionals such as police officers, social workers and those working in health and social care.

The Cardiff University Crime & Security Research Institute (CSRI) is working with the School of Social Sciences to develop natural language processing (NLP) technology to support the indexing and search functions of the WSR platform. The safeguarding review reports are lengthy and complex, and the collection of reports is constantly growing. This makes the manual identification of common themes and issues across the entire collection of reports a challenging task. A team of social scientists has developed a 'coding framework' for highlighting themes within the reports. The thematic framework was developed to help identify common problems and issues in multi-agency work across different reports.

This project aims at using the thematic framework to facilitate the discovery of similar documents across the collection. The automatic identification of similar reports will enable faster and more accurate decision-making by practitioners from health and social care agencies. The expected outcome of the project is an application which calculates a similarity measure incorporating document content and the themes appearing within the documents in order to identify the most similar reports from the collection for a given document. The project is suitable for students who've enjoyed studying machine learning with a particular interest in NLP.

LINKS: Wales Safeguarding Repository: http://upsi.org.uk/projects-2/wsr CSRI work applying NLP to WSR reports: https://arxiv.org/abs/2010.14584

Final Report (21/10/2022) [Zip Archive]

Publication Form