This project focuses on developing a rule-based mutation explainer for Welsh, aimed at supporting language learners and advancing Welsh NLP. Initial consonant mutation is a distinctive feature of Welsh grammar, where the first consonant of a word changes depending on its grammatical environment. While existing tools such as CyTag can detect mutated forms, they cannot explain why a mutation occurs, which limits their usefulness for learners and educators. To address this gap, the system integrates Constraint Grammar (CG) rules, informed by linguistic insights, to automatically identify mutations, classify them into different subfamilies of triggers across lexical, morphological, and syntactic contexts, and finally automatically generate explanations using Python.
A key contribution of this work is the creation of an annotated dataset of mutation cases, designed to test the system’s accuracy at detecting a wide variety of triggers. The explainer is also evaluated against state-of-the-art large language model (LLM) baselines, such as GPT-4o-Mini and Claude-4-Sonnet, and demonstrating that, in low-resource settings for languages like Welsh, linguistically informed, rule-based systems often outperform modern NLP techniques.
Finally, a pilot interactive website has been developed to visualise the system’s outputs directly within Welsh texts, allowing learners to see why a mutation has occurred in context. By combining linguistic knowledge, computational methods, and learner-focused design, this project contributes to Welsh language technologies, offering new tools that enhance language pedagogy while also laying the groundwork for future advances in interpretable NLP for low-resource languages.