COMPUTATIONAL IDENTIFICATION AND FUNCTIONAL ANNOTATION OF NON-CODING GENETIC VARIANTS USING WHOLE-GENOME SEQUENCING DATA

Authors

  • Sudeshna Chakraborty Professor, School of Computer Science and Engineering, Galgotias University, India Author
  • Dr Ranjana Patnaik Professor, Department of Biomedical Sciences, School of Biosciences and Technology, Galgotias University, India Author
  • Jayakodi. T Assistant Professor, Meenakshi College of Allied Health Sciences, Meenakshi Academy of Higher Education and Research Author
  • Kasthuri K Associate Professor, Department of Biochemistry, Meenakshi Medical College Hospital & Research Institute, Meenakshi Academy of Higher Education and Research Author
  • Dr. Anbukkarasi Associate Professor, Pathology, Sree Balaji Medical College and Hospital, Bharath Institute of Higher Education and Research Author

DOI:

https://doi.org/10.4238/33049296

Abstract

Non-coding genetic variants are also a delegated criticism of genomics because of their regulatory sophistication and absence of direct protein-coding impacts. The paper introduces a full computational pipeline of the recognition and functional annotation of non-coding variants with whole-genome sequencing (WGS)-level data. The pipeline proposed combines the variants sorting, regulatory regions mapping, multi dimensional feature discovery and a machine learning based classification to order variants of interest in order of importance. The publicly available repositories have been used to obtain whole-genome variant datasets that were annotated based on conservation scores, chromatin accessibility profiles, transcription factor binding sites and epigenomic signatures. An Extreme Gradient Boosting (XGBoost) classifier was used to determine the functional and non-functional variants in the basis of these combined features. This model was found to have an accuracy of 92.4 percent and it had better performance than the tools that were considered to be in use like CADD and GWAVA. Besides, the analysis of functional enrichment showed that prioritized variants have strong relationships with major regulatory pathways and disease-relevant gene networks. The results indicate how effectively the combination of multi-omics data and explainable machine learning methods can be used to enhance the prediction of non-coding genetic variation and biological insights.

Downloads

Published

2026-03-20

Issue

Section

Articles

How to Cite

COMPUTATIONAL IDENTIFICATION AND FUNCTIONAL ANNOTATION OF NON-CODING GENETIC VARIANTS USING WHOLE-GENOME SEQUENCING DATA. (2026). Genetics and Molecular Research. https://doi.org/10.4238/33049296

Most read articles by the same author(s)