A Big Data Analytics And Statistical Genetics Approach For Gene Expression–Based Biomarker Discovery In Neurodegenerative Disorders Using AI And Machine Learning

Dr. P. Sedhupathy; Suresh Arumugam; Prof. Takhellambam Kiranmala Chanu; Dr. M. Nithya; Dr. R.S. Shanmugasundaram; Dr. A. Selvaraj; Sahil Sharma

doi:10.4238/3r8zn256

Authors

Dr. P. Sedhupathy Assistant Professor, Department of Computer Science (Artificial Intelligence & Data Science), Dr. SNS Rajalakshmi College of Arts and Science, Coimbatore, Tamilnadu,India. Author
Suresh Arumugam Scientist, Central Research Laboratory, Meenakshi Medical College Hospital & Research Institute, Meenakshi Academy of Higher Education and Research, Chennai, Tamilnadu, India. Author
Prof. Takhellambam Kiranmala Chanu HOD (OBG Nursing), Parul institute of Nursing, Parul University, Vadodara, Gujarat, India. Author
Dr. M. Nithya Professor, Department of Computer Science and Engineering, Vinayaka Mission's Kirupananda Variyar Engineering College, Salem (Vinayaka Mission's Research Foundation), Tamilnadu,India. Author
Dr. R.S. Shanmugasundaram Professor, Department of Computer Science and Engineering, Vinayaka Mission's Kirupananda Variyar Engineering College, Salem (Vinayaka Mission's Research Foundation), Tamilnadu, India. Author
Dr. A. Selvaraj Assistant Professor, Department of Mathematics, VelTech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, avadi, Chennai – 600062, Tamilnadu, India. Author
Sahil Sharma Assistant Professor, Faculty of Computing, Guru Kashi University, Bathinda, Punjab, India. Author

DOI:

https://doi.org/10.4238/3r8zn256

Abstract

Alzheimer disease (AD) and Parkinson disease (PD) are neurodegenerative disorders that are marked by progressive neuronal dysfunction and significant molecular heterogeneity that does not permit early diagnosis and specific intervention. Gene expression profiling provides an effective method to discovery transcriptomic biomarkers, but high dimensionality, cohort variability and multiple-testing burden results tend to undermine the reproducibility. In the research, we used a combined big data analytics and statistical genetics platform to conduct robust gene expression-based biomarkers by using the publicly available transcriptomic data of brain and peripheral blood samples (in total n = 412; 238 cases and 174 controls). The differential expression analysis was performed through moderated linear modelling with false discovery rate (FDR) control of Benjamini-Hochberg error and post-processing quality control and normalisation to minimise the effects of batching. It was used to consider genes significant as FDR < 0.05 with log 2 fold change value 1 or more and confidence interval does not cross zero. This statistical filtering found 326 dysregulated genes significant enough to be enriched with pathways which are associated with neuroinflammation, synaptic signalling, mitochondrial dysfunction, and protein homeostasis. In order to optimise the candidate biomarkers, we used a machine learning pipeline with an Elastic Net constant, Random Forest ranking of importance and stability selection and then classified them with logistic regression, support vector machine, and gradient boosting models. The consistent resampling biomarker panel was a 14-gene biomarker panel. In stratified nested cross-validation, the highest performing classifier had an area under the receiver operating characteristic curve (AUROC) of 0.91 ± 0.03, sensitivity of 0.87 and specificity of 0.85 and was also highly stable in terms of its performance in independent validation cohorts (AUROC = 0.88). A combination of the effect size, FDR signal and confidence interval reporting was more effective in enhancing the reliability of biomarkers compared to selection using p-value. These results indicate that research methods that integrate stringent statistical genetics with machine learning algorithms that can be easily interpreted have increased the strength and forecasting capacity of gene expression-based biomarkers. The suggested framework is a consistent and biologically based approach to AI-led biomarker discovery in neurodegenerative diseases, which will be used in translational and precision medicine in the future.

A Big Data Analytics And Statistical Genetics Approach For Gene Expression–Based Biomarker Discovery In Neurodegenerative Disorders Using AI And Machine Learning

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Most read articles by the same author(s)

Make a Submission

side

INDEXING

right

Language