A Hybrid Big Data Analytics and Explainable Machine Learning Approach for Predictive Detection of Cancer-Associated Genomic Variations

Authors

  • Indu Purushothaman Assistant Professor, Department of Research, Meenakshi Academy of Higher Education and Research. Author
  • Muninathan N Central Research Laboratory, Meenakshi Medical College Hospital & Research Institute, Meenakshi Academy of Higher Education and Research. Author
  • Navami Gopan Assisant Professor, Department of Pediatric and Preventive Dentistry, Meenakshi Ammal Dental College and Hospital, Meenakshi Academy of Higher Education and Research Author
  • Dhanalakshmi S Professor, Meenakshi College of Pharmacy, Meenakshi Academy of Higher Education and Research Author
  • Prakash P Arulmigu Meenakshi College of Nursing, Meenakshi Academy of Higher Education and Research. Author
  • Pooraninagalakshmi J Conservative Dentistry and Endodontics, Senior lecturer, Sree Balaji Dental College and Hospital, (Affiliated to Bharath Institute of Higher Education and Research), Pallikaranai, Chennai Author
  • Pearlin Mary Professor, Sree Balaji Dental College and Hospital, (Affiliated to Bharath Institute of Higher Education and Research), Pallikaranai, Chennai Author
  • Dr. R Senthilnathan Professor, Department of Oral Maxillofacial Surgery, Sree Balaji Dental College & Hospital, Bharath Institute of Higher Education and Research, Tamil Nadu, India, Chennai. Author

DOI:

https://doi.org/10.4238/jw3t1503

Abstract

Cancer is a genomic disease or a disease of genetic accumulation where genetic and genomic changes that alter the normal cellular activities and signal pathways are accumulated. Recent developments in high-throughput sequencing technologies have provided the ability to generate mass cancer genomic results to present new possibilities in systematic occurrence and explanation of cancer-related genomic variations. Nevertheless, the dimensionality, heterogeneity, and complexity of these datasets are very high and thus become a big challenge to the traditional analytical techniques. This paper presents a hybrid big data analytics and explainable machine learning model that can be applied to the prediction and biological explanation of the presence of cancer-related genomic variations. Large-scale cancer genomic datasets publicly available were examined to obtain and annotate genomic variants, and feature-engineered to represent gene-level as well as pathway-level features. Machine learning models that were supervised to work on variations related to cancer and overall genomic patterns over the background were trained. To increase interpretability, explainable artificial intelligence methods were used to measure the effect of the contribution of each genomic feature to the model predictions. The findings reveal that all the three proposed frameworks satisfy strong predictive power with the accuracy of about 89% on the validation datasets and at the same time, they reveal important genome variations associated with biologically relevant genes and cancer-related pathways. The explainability analysis also outlines the molecular mechanisms related to tumor genesis that have provided the predictive models with biological validity. Altogether, this paper combines explainable machine learning with scalable analytics based on ascertainable big data to offer an understandable and biologically-founded method to analyse genomic data on cancer. The suggested model has a potential value when it comes to accuracy in oncology applications and could help to pursue translational cancer research using a better insight into cancer-related genomic variations.

Downloads

Published

2026-01-06

Issue

Section

Articles

How to Cite

A Hybrid Big Data Analytics and Explainable Machine Learning Approach for Predictive Detection of Cancer-Associated Genomic Variations. (2026). Genetics and Molecular Research, 25(1), 1-9. https://doi.org/10.4238/jw3t1503

Most read articles by the same author(s)