Knowledge Management via Natural Language Processing: Document Discovery in the Wales Safeguarding Repository

Darcie Lowe


Supervised by Alun D Preece; Moderated by Jianhua Shao

The Wales Safeguarding Repository (WSR) is an online collection of safeguarding review reports containing valuable information about learning experiences in tackling serious incidents involving children and vulnerable adults. Research in the School of Social Sciences at Cardiff University has highlighted the need for better organisational learning to emerge from these reviews, in order to enhance the future safeguarding practices of professionals such as police officers, social workers and those working in health and social care.

The Cardiff University Security Crime and Intelligence Innovation Institute (SCIII) is working with the School of Social Sciences to develop natural language processing (NLP) technology to support the indexing and search functions of the WSR platform. The safeguarding review reports are lengthy and complex, and the collection of reports is constantly growing. This makes the manual identification of common themes and issues across the entire collection of reports a challenging task. A team of social scientists has developed a 'coding framework' for highlighting themes within the reports. The thematic framework was developed to help identify common problems and issues in multi-agency work across different reports.

This project aims at using the thematic framework to facilitate the discovery of similar documents across the collection. The automatic identification of similar reports will enable faster and more accurate decision-making by practitioners from health and social care agencies. The expected outcome of the project is an application which calculates a similarity measure incorporating document content and the themes appearing within the documents in order to identify the most similar reports from the collection for a given document. The project is suitable for students with an interest in machine learning specifically NLP.

LINKS: Wales Safeguarding Repository: http://upsi.org.uk/projects-2/wsr SCIII work applying NLP to WSR reports: https://arxiv.org/abs/2010.14584

Initial Plan (06/02/2023) [Zip Archive]

Final Report (12/05/2023) [Zip Archive]

Publication Form