Revitalizing Bahnaric Language through Neural Machine Translation:Challenges, Strategies, and Promising Outcomes

Abstract

The Bahnar, a minority ethnic group in Vietnam with ancient roots, hold a language of deep cultural and historical significance. The government is prioritizing the preservation and dissemination of Bahnar language through online availability and cross-generational communication. Recent AI advances, including Neural Machine Translation (NMT), have transformed translation with improved accuracy and fluency, fostering language revitalization through learning, communication, and documentation. In particular, NMT enhances accessibility for Bahnar language speakers, making information and content more available. However, translating Vietnamese to Bahnar language faces practical hurdles due to resource limitations, particularly in the case of Bahnar language as an extremely low-resource language. These challenges encompass data scarcity, vocabulary constraints, and a lack of fine-tuning data. To address these, we propose transfer learning from selected pre-trained models to optimize translation quality and computational efficiency, capitalizing on linguistic similarities between Vietnamese and Bahnar language. Concurrently, we apply tailored augmentation strategies to adapt machine translation for the Vietnamese-Bahnar language context. Our approach is validated through superior results on bilingual Vietnamese-Bahnar language datasets when compared to baseline models. By tackling translation challenges, we help revitalize Bahnar language, ensuring information flows freely and the language thrives.

Type
Publication
In Proceedings of the AAAI Conference on Artificial Intelligence
Duc Q. Nguyen
Duc Q. Nguyen
CS Master Student

My research interests include Generative Models, Graph Representation Learning, and Probabilistic Machine Learning. My application interests include Natural Language Processing, Healthcare, and Education.