P.C. Carvalho1, S.S. Freitas, A.B. Lima, M. BarrosI. Bittencourt, W. Degrave, I. Cordovil, R. Fonseca,M.G.C. Carvalho, R.S. Moura Neto and P.H. Cabello
Published December 18, 2006
Genet. Mol. Res. 5 (4): 856-867 (2006)
About the authors
P.C. Carvalho1, S.S. Freitas, A.B. Lima, M. BarrosI. Bittencourt, W. Degrave, I. Cordovil, R. Fonseca,M.G.C. Carvalho, R.S. Moura Neto and P.H. Cabello
Corresponding author
P.C. Carvalho
E-mail: carvalhopc@cos.ufrj.br
ABSTRACT
Statistical modeling of links between genetic profiles with environmental and clinical data to aid in medical diagnosis is a challenge. Here, we present a computational approach for rapidly selecting important clinical data to assist in medical decisions based on personalized genetic profiles. What could take hours or days of computing is available on-the-fly, making this strategy feasible to implement as a routine without demanding great computing power. The key to rapidly obtaining an optimal/nearly optimal mathematical function that can evaluate the “dis ease stage” by combining information of genetic profiles with personal clinical data is done by querying a precomputed solution database. The database is previously generated by a new hybrid feature selection method that makes use of support vector machines, recursive feature elimination and random sub-space search. Here, to evaluate the method, data from polymorphisms in the renin-angiotensin-aldosterone system genes together with clinical data were obtained from patients with hypertension and control subjects. The disease “risk” was determined by classifying the patients’ data with a support vector machine model based on the optimized feature; then measuring the Euclidean distance to the hyperplane decision function. Our results showed the association of reninangiotensin-aldosterone system gene haplotypes with hypertension. The association of polymorphism patterns with different ethnic groups was also tracked by the feature selection process. A demonstration of this method is also available online on the project’s web site.
Key words: Genetic polymorphisms, Essential hypertension, Evironmental risks, Support vector machines, Feature selection.