Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models

Abstract

Recent advancements in large language models (LLMs) have underscored their importance in the evolution of artificial intelligence. However, despite extensive pretraining on multilingual datasets, available open-sourced LLMs exhibit limited effectiveness in processing Vietnamese. The challenge is exacerbated by the absence of systematic benchmark datasets and metrics tailored for Vietnamese LLM evaluation. To mitigate these issues, we have finetuned LLMs specifically for Vietnamese and developed a comprehensive evaluation framework encompassing 10 tasks and 31 metrics. We observe that finetuning can help LLMs transfer knowledge across languages, serving as an efficient way to bolster their capabilities in non-English languages. Moreover, our analysis indicates that larger models can introduce more biases and uncalibrated outputs and the key factor influencing LLM performance is the quality of the training or finetuning datasets. These insights underscore the significance of meticulous finetuning with high-quality datasets in enhancing LLM performance.

Type
Publication
In Proceddings of 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Duc Q. Nguyen
Duc Q. Nguyen
CS Master Student

My research interests include Generative Models, Graph Representation Learning, and Probabilistic Machine Learning. My application interests include Natural Language Processing, Healthcare, and Education.