← Back to Publications List

Topic Modeling Violence Against Women in Bangladesh News: A Comparative Approach with LDA, Top2Vec, and BERTopic

Students & Supervisors

Student Authors
Ashiqur Rahman Saron
Bachelor of Science in Computer Science & Engineering, FST
Hasin Almas Sifat
Bachelor of Science in Computer Science & Engineering, FST
Soumodip Madhu
Bachelor of Science in Computer Science & Engineering, FST
Koushik Biswas Arko
Bachelor of Science in Computer Science & Engineering, FST
Faiza Mahmood Aroni
Bachelor of Science in Computer Science & Engineering, FST
Supervisors
Md. Mortuza Ahmmed
Associate Professor, Faculty, FST

Abstract

"In Bangladesh, violence against women is one of the most significant social issues that continues to be underreported and under-represented by institutions. The conversation surrounding the gendered violence has not been adequately captured by surveys and other policy frameworks. Therefore, this study builds on the current body of methodological literature by offering a more comprehensive view of Violence Against Women using topic modeling methods; specifically, Latent Dirichlet Allocation (LDA), Top2Vec and BERTopic. The analysis utilized a set of 2,000 news articles published in the media that cover the topic of Female Violence as the data source for this study. The three models were evaluated according to their ability to produce coherent, interpretable, meaningful and context sensitive topics. The LDA model yielded topic categories at a high level; however, political news items were often combined with those focused on gender. The Top2Vec model formed context-specific clusters, but the need for proper names limited its generalizability. In contrast, the BERTopic model produced semantically rich and thematically distinct topics, including Campus Harassment, Dowry-related Violence, and Workplace Discrimination, resulting from the use of transformer embedding techniques. The comparison of all three models indicates that the BERTopic Model has greater capabilities than the LDA and Top2Vec Models with respect to capturing the subtle semantics of the relationships between the three models, and thus provides more usable data. Therefore, the findings of this study provide a new way for NLP-based topic modeling methods to be effective resources for growing researcher and policy decision maker awareness of systemic and gendered violence as well as enhancing the use of data-driven approaches for implementing solutions."

Keywords

Violence against women Topic modeling LDA Top2Vec BERTopic Natural language processing Bangladesh Transformer models

Publication Details

  • Type of Publication:
  • Conference Name: 11th IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering 2025 (IEEE WIECON-ECE 2025)
  • Date of Conference: 21/12/2025 - 21/12/2025
  • Venue: Long Beach Hotel, Cox’s Bazar, Bangladesh
  • Organizer: IEEE Bangladesh Section and IEEE WIE