This project will consider users' posts on Medical Forums (OCD forum as an example). Users of these forums share stories, experiences and ask questions. A tool that can automatically identify relevant tags for the users' posts can enhance search and retrieval.
This project will develop a tool to annotate a user's post with tags/tag phrases that are relevant to OCD. A basic solution is to identify tags using concepts that exist in the text of the post using NLP techinques (named entity recognition) without considering its particular relation to OCD. An extension is to consider the similarity between the user's post and concepts that are related to OCD. Here you will use a dictionary of terms and definitions of OCD, represented as an Ontology/Knoweldge graph. You will determine the similarity between the definitions with the post text using word embeddings and produce a ranked set of tags/concepts for the posts, and label(s) to classify the post. You can then run experiments using LLMs to compare the effectiveness of the methods.
The project will benefit from already existing libraries, data sets and previous projects on this subject.