Revitalizing Bahnaric Language through Neural Machine Translation:Challenges, Strategies, and Promising Outcomes

Hoang Nhat Khang Vo, Duc Dong Le, Tran Minh Dat Phan, Tan Sang Nguyen, Quoc Nguyen Pham, Ngoc Oanh Tran, Duc Q. Nguyen, Tran Minh Hieu Vo, Tho Quan

March, 2024 Deep Learning

Abstract

The Bahnar, a minority ethnic group in Vietnam with ancient roots, hold a language of deep cultural and historical significance. The government is prioritizing the preservation and dissemination of Bahnar language through online availability and cross-generational communication. Recent AI advances, including Neural Machine Translation (NMT), have transformed translation with improved accuracy and fluency, fostering language revitalization through learning, communication, and documentation. In particular, NMT enhances accessibility for Bahnar language speakers, making information and content more available. However, translating Vietnamese to Bahnar language faces practical hurdles due to resource limitations, particularly in the case of Bahnar language as an extremely low-resource language. These challenges encompass data scarcity, vocabulary constraints, and a lack of fine-tuning data. To address these, we propose transfer learning from selected pre-trained models to optimize translation quality and computational efficiency, capitalizing on linguistic similarities between Vietnamese and Bahnar language. Concurrently, we apply tailored augmentation strategies to adapt machine translation for the Vietnamese-Bahnar language context. Our approach is validated through superior results on bilingual Vietnamese-Bahnar language datasets when compared to baseline models. By tackling translation challenges, we help revitalize Bahnar language, ensuring information flows freely and the language thrives.

Type

Publication

In Proceedings of the AAAI Conference on Artificial Intelligence

Computer Science Natural Language Processing Neural Machine Translation Low-Resource Language Transfer Learning Augmentation Bahnar Language

Revitalizing Bahnaric Language through Neural Machine Translation:Challenges, Strategies, and Promising Outcomes

Abstract

Duc Q. Nguyen

CS PhD Student