PATS2
You are not logged in
Time stamp: 22:07:19-16/3/2025

[Login]

Cross-Linguistic Evaluation of Large Language Models


Haoran Cheng

03/09/2024

Supervised by Mohammad Taher Pilehvar; Moderated by Christopher Wallbridge

This study investigates the multilingual capabilities of large language models (LLMs) such as GPT-3.5 and GPT-4, focusing on English, Chinese, Spanish, and French. Utilizing the LMentry tool, this study designed an experimental framework that comprises 170 questions across four categories: word attribute recognition, word position in a sentence, basic mathematics, and letter recognition in words. The models' performance was evaluated through accuracy and response times, revealing significant differences in handling various languages. Chain of Thought (CoT) analysis and few-shot training were employed to find issues and enhance the model’s ability to handle them. CoT improved accuracy in certain tasks, challenges remained. But studies using this approach finding problems with punctuation and letter recognition. In terms of few-shot training,the findings underscore the need for more comprehensive examples in few-shot training and further optimization of LLMs for multilingual tasks. This research contributes to the ongoing development of LLMs, emphasizing the importance of inclusivity and precision in AI applications across diverse languages.


Final Report (03/09/2024) [Zip Archive]

Publication Form