Research Article

A novel method for identifying SNP disease association based on maximal information coefficient

Published: December 19, 2014
Genet. Mol. Res. 13 (4) : 10863-10877 DOI: https://doi.org/10.4238/2014.December.19.7
Cite this Article:
H.M. Liu, N. Rao, D. Yang, L. Yang, Y. Li, F. Ou (2014). A novel method for identifying SNP disease association based on maximal information coefficient. Genet. Mol. Res. 13(4): 10863-10877. https://doi.org/10.4238/2014.December.19.7
1,057 views

Abstract

To improve single-nucleotide polymorphism (SNP) association studies, we developed a method referred to as maximal information coefficient (MIC)-based SNP searching (MICSNPs) by employing a novel statistical approach known as the MIC to identify SNP disease associations. MIC values varied with minor allele frequencies of SNPs and the odds ratios for disease. We used a Monte Carlo-based permutation test to eliminate the effects of fluctuating MIC values and included a sliding-window-based binary search whose time-cost was 0.58% that of a sequential search to save time. The experiments examining both simulation and actual data demonstrated that our method is computationally and statistically feasible after reducing the resampling count to 4 times the number of markers and applying a sliding-window-based binary search to the method. We found that our method outperforms existing approaches.

To improve single-nucleotide polymorphism (SNP) association studies, we developed a method referred to as maximal information coefficient (MIC)-based SNP searching (MICSNPs) by employing a novel statistical approach known as the MIC to identify SNP disease associations. MIC values varied with minor allele frequencies of SNPs and the odds ratios for disease. We used a Monte Carlo-based permutation test to eliminate the effects of fluctuating MIC values and included a sliding-window-based binary search whose time-cost was 0.58% that of a sequential search to save time. The experiments examining both simulation and actual data demonstrated that our method is computationally and statistically feasible after reducing the resampling count to 4 times the number of markers and applying a sliding-window-based binary search to the method. We found that our method outperforms existing approaches.