← Back to Publications List

Discovering AI Research Trends: A Comparative Study with TF-IDF, BERT and PCA-Based Clustering

Students & Supervisors

Student Authors
Naved Akhter
Bachelor of Science in Computer Science & Engineering, FST
P.m. Tasriful Islam
Bachelor of Science in Computer Science & Engineering, FST
Md Abdullah Al Noman
Bachelor of Science in Computer Science & Engineering, FST
Md. Saif Ahmed Sourav
Bachelor of Science in Computer Science & Engineering, FST
Supervisors
Tohedul Islam
Assistant Professor, Faculty, FST

Abstract

Reading scientific abstracts provides a quick overview of research content, but manually reviewing thousands of them is time-consuming and inefficient. This study applies three clustering algorithms, K-Means, Hierarchical, and DBSCAN, to analyze a dataset of 2,000 abstracts with two NLP techniques: TF-IDF and BERT embeddings.Clusters show the top keyword that are used most in scientific paper’s abstract to understand the overall context. The Dataset’s abstract was related to LLM and we applied Principal Component Analysis for data Visualization. Results show that BERT embeddings consistently outperform TF-IDF representations, with the BERT + K-Means combination achieving the most coherent clusters. The silhouette scores of K-Means was 0.007 with a DB index 5.246 applied TF-IDF and silhouette scores of 0.138 with a DB index 2.428 applying BERT where K-Means performs the best. This work demonstrates how unsupervised clustering can reveal emerging themes in research on Large Language Models (LLMs) and Cyber Security, offering a practical tool for researchers to identify trends efficiently without extensive manual reading.

Keywords

Clustering TF-IDF BERT K-Means NLP

Publication Details

  • Type of Publication:
  • Conference Name: 28th International Conference on Computer and Information Technology
  • Date of Conference: 19/12/2025 - 19/12/2025
  • Venue: Long Beach Hotel, Coxs Bazar
  • Organizer: IEEE Bangladesh Section.