COMPUTATIONAL IDENTIFICATION AND FUNCTIONAL ANNOTATION OF NON-CODING GENETIC VARIANTS USING WHOLE-GENOME SEQUENCING DATA
DOI:
https://doi.org/10.4238/33049296Abstract
Non-coding genetic variants are also a delegated criticism of genomics because of their regulatory sophistication and absence of direct protein-coding impacts. The paper introduces a full computational pipeline of the recognition and functional annotation of non-coding variants with whole-genome sequencing (WGS)-level data. The pipeline proposed combines the variants sorting, regulatory regions mapping, multi dimensional feature discovery and a machine learning based classification to order variants of interest in order of importance. The publicly available repositories have been used to obtain whole-genome variant datasets that were annotated based on conservation scores, chromatin accessibility profiles, transcription factor binding sites and epigenomic signatures. An Extreme Gradient Boosting (XGBoost) classifier was used to determine the functional and non-functional variants in the basis of these combined features. This model was found to have an accuracy of 92.4 percent and it had better performance than the tools that were considered to be in use like CADD and GWAVA. Besides, the analysis of functional enrichment showed that prioritized variants have strong relationships with major regulatory pathways and disease-relevant gene networks. The results indicate how effectively the combination of multi-omics data and explainable machine learning methods can be used to enhance the prediction of non-coding genetic variation and biological insights.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

