چکیده
|
Background Oral cancer (OC) is a debilitating disease that can afect the quality of life of these patients adversely. Oral premalignant lesion patients have a high risk of developing OC. Therefore, identifying robust survival subgroups among them may signifcantly improve patient therapy and care. This study aimed to identify prognostic biomarkers that predict the time-to-development of OC and survival stratifcation for patients using state-of-the-art machine learning and deep learning. Methods Gene expression profles (29,096 probes) related to 86 patients from the GSE26549 dataset from the GEO repository were used. An autoencoder deep learning neural network model was used to extract features. We also used a univariate Cox regression model to select signifcant features obtained from the deep learning method (P<0.05). High-risk and low-risk groups were then identifed using a hierarchical clustering technique based on 100 encoded features (the number of units of the encoding layer, i.e., bottleneck of the network) from autoencoder and selected by Cox proportional hazards model and a supervised random forest (RF) classifer was used to identify gene profles related to subtypes of OC from the original 29,096 probes. Results Among 100 encoded features extracted by autoencoder, seventy features were signifcantly related to timeto-OC-development, based on the univariate Cox model, which was used as the inputs for the clustering of patients. Two survival risk groups were identifed (P value of log-rank test=0.003) and were used as the labels for supervised classifcation. The overall accuracy of the RF classifer was 0.916 over the test set, yielded 21 top genes (FUT8-DDR2- ATM-CD247-ETS1-ZEB2-COL5A2-GMAP7-CDH1-COL11A2-COL3A1-AHR-COL2A1-CHORDC1-PTP4A3-COL1A2-CCR2- PDGFRB-COL1A1-FERMT2-PIK3CB) associated with time to developing OC, selected among the original 29,096 probes. Conclusions Using deep learning, our study identifed prominent transcriptional biomarkers in determining
|