Discovering AI Research Trends: A Comparative Study with TF-IDF, BERT and PCA-Based Clustering
Students & Supervisors
Student Authors
Supervisors
Abstract
Reading scientific abstracts provides a quick overview of research content, but manually reviewing thousands of them is time-consuming and inefficient. This study applies three clustering algorithms, K-Means, Hierarchical, and DBSCAN, to analyze a dataset of 2,000 abstracts with two NLP techniques: TF-IDF and BERT embeddings.Clusters show the top keyword that are used most in scientific paper’s abstract to understand the overall context. The Dataset’s abstract was related to LLM and we applied Principal Component Analysis for data Visualization. Results show that BERT embeddings consistently outperform TF-IDF representations, with the BERT + K-Means combination achieving the most coherent clusters. The silhouette scores of K-Means was 0.007 with a DB index 5.246 applied TF-IDF and silhouette scores of 0.138 with a DB index 2.428 applying BERT where K-Means performs the best. This work demonstrates how unsupervised clustering can reveal emerging themes in research on Large Language Models (LLMs) and Cyber Security, offering a practical tool for researchers to identify trends efficiently without extensive manual reading.
Keywords
Publication Details
- Type of Publication:
- Conference Name: 28th International Conference on Computer and Information Technology
- Date of Conference: 19/12/2025 - 19/12/2025
- Venue: Long Beach Hotel, Coxs Bazar
- Organizer: IEEE Bangladesh Section.