BRRD-Net: A Transformer-Based Framework for Region-Specific Romanized Bangla Dialect Detection Using Pretrained Embeddings and Prototype Learning
Students & Supervisors
Student Authors
Supervisors
Abstract
This study introduces BRRD-Net (Bangla Region-based Romanized Dialect-Net), which is a totally new pedagogical framework based on Transformer to automatically identifying regional dialects of Romanized Bangla (Banglish) text. The framework utilizes multilingual contextual embeddings settings from XLM-RoBERTa and parameter-efficient fine-tuning via Low-Rank Adaptation (LoRA), along with a Prototypical Classification layer, to robustly identify minute phonological, lexical and orthographic differences between five of the most prominent Bengali dialects. To make the model robust to noisy text generated by users, we have developed a systematic preprocessing pipeline, which includes noise-aware data augmentation techniques, including phonetic normalizations, character-level perturbation, and code-mixing English words. Evaluations conducted on the BODD dataset show that BRRD-net achieves 81.60% accuracy and macro-F1 = 0.816 and is able to outpace strong transformer baselines, such as, mBERT, RoBERTa and DistilBERT. Additionally, examination of confusion matrix further demonstrates distinct class separation which indicates an area of challenge given how overlapping the dialect pairs are lexically. BRRD-net provides an interpretable and scalable framework for low-resource identifying dialects of Romanized languages.
Keywords
Publication Details
- Type of Publication:
- Conference Name: International Conference on Electrical, Computer Telecommunication Engineering (ICECTE 2026)
- Date of Conference: 29/01/2026 - 29/01/2026
- Venue: RUET, Rajshahi
- Organizer: Faculty of ECE, Ruet