Deep Structured Learning for Gene Classification from k-mer Encoded DNA
Students & Supervisors
Student Authors
Supervisors
Abstract
Gene classification is a fundamental task in genomics that supports disease gene association, functional annotation, and precision medicine. Traditional machine learning approaches based on handcrafted features are limited in scalability and fail to capture complex sequence dependencies. This paper proposes a deep learning framework that integrates convolutional neural networks, bidirectional long, short term memory networks, and a self-attention mechanism for large scale gene classification. DNA sequences are encoded using 4-k-mer tokenization, enabling the model to capture local motifs, long range dependencies, and informative subsequences with improved interpretability. To address imbalance, the model employs weighted loss functions and advanced learning rate scheduling. Experiments on a dataset with more than twenty-two thousand gene types show that the framework achieves about ~99% accuracy and F1 score, surpassing benchmark models such as DeepBind, DanQ, and DNABERT. The results demonstrate that the proposed framework is scalable, interpretable, and effective, providing a promising direction for genomic sequence analysis and downstream applications.
Keywords
Publication Details
- Type of Publication:
- Conference Name: 28th International Conference on Computer and Information Technology (ICCIT 2025)
- Date of Conference: 19/12/2025 - 19/12/2025
- Venue: LONG BEACH HOTEL, COX'S BAZAR, BANGLADESH
- Organizer: IEEE