[PDF]

LLM-Driven Argument Mining and Feedback Generation for Persuasive Writing Assessment


Sydney Pollock

08/09/2025

Supervised by Fernando Alva Manchego; Moderated by Dr Soumya Barathi

This research develops a multi-stage pipeline combining Argument Mining and Feedback Generation for persuasive writing assessment using smaller, open-source LLMs (Llama-3.2-3B-Instruct and Gemma-2-2B-IT). The study evaluates twelve different prompt engineering combinations investigating zero-shot versus few-shot prompting, chain-of-thought reasoning and variations in the quality of few-shot examples. Results show Gemma models consistently outperformed Llama in argument component identification (weighted F1: 0.39 vs 0.28), while few-shot prompting with varied-quality examples proved more effective than elite-only examples. Contrary to expectations, chain-of-thought prompting reduced accuracy in these smaller models. The feedback generation pipeline demonstrated significant benefits from argument mining grounding, with baseline approaches experiencing 62% output failure rates compared to a maximum of 24% for grounded approaches. Human teacher evaluation revealed that few-shot prompted feedback was the most effective for addressing argument structure, though most outputs required an element of modification before classroom use. Importantly, human and LLM evaluations diverged with teachers providing more critical assessments of the generated feedback than an LLM as a judge based on their pedagogical consideration. The findings emphasise the value of varied-quality training examples and highlight the continued necessity of human oversight in educational AI applications.


Final Report (08/09/2025) [Zip Archive]

Publication Form