Leukemia Cancer Gene Identification Through Genetic Optimization Technique and Prediction with LSTM

Main Article Content

M.Kalaivani , K.Abirami , K.Dharmarajan

Abstract

Cancer disease persists as one of the major causes of mortality worldwide, emphasizing the critical necessity for accurate and early diagnosis.  Identifying genes associated with cancer can facilitate effective early-stage treatment. DNA Microarray data plays a vital role in detecting and diagnosing cancer. Microarray analysis allows the examination of gene expression levels in specific cell samples, enabling the simultaneous analysis of thousands of genes. However, microarray gene data is characterized by high dimensionality and contains redundant genes. The gene (feature) selection process is essential in microarray gene expression data analysis for selecting highly relevant genes with low redundancy that may cause improved prediction results. This research article presents a model designed to reduce the high-dimensional gene data to low-dimensional space, eliminate redundant genes, and improve early-stage disease prediction. The proposed model is implemented in the following phases. The initial process involves checking the multicollinearity between genes and identifying the low intercorrelated genes using the Karl Pearson correlation coefficient on the cancer dataset. Next, the Genetic Optimization Algorithm integrated with the Signal to Noise Ratio method is applied to determine an optimal set of genes, focusing on reducing dimensionality while filtering out noisy and redundant genes.   In the final phase, the selected genes are classified using a Deep learning classification method namely the Long Short Term Memory classifier is utilized.  The model’s performance is evaluated with various optimizers, including Adam, Adagrad, RMSProp, and SGD by comparing accuracy and loss values. Also calculate the other classification parameters such as Precision, Recall, and F1-Score value. Finally, comparative analyses of these metrics across the different optimizers are performed. The experimental results demonstrate that the research model Cor-GA_SNR-LSTM significantly improves the detection analysis by reducing the number of features in the gene data, enhancing the accuracy value, and minimizing the loss value.  The Adam optimizer yielded the optimal performance with an accuracy of 88.45% and a loss value of 0.1612. The proposed method effectively identifies the most relevant genes responsible for detecting and predicting Leukemia cancer disease.

Article Details

Section
Articles