Splice site prediction using stochastic regular grammars

A.Y. Kashiwabara, D.C.G. VieiraA. Machado-Lima,A.M. Durham
Published: March 20, 2007
Genet. Mol. Res. 6 (1) : 105-115
 
Cite this Article:
A.Y. Kashiwabara, D.C.G. Vieira, A. Machado-Lima, A.M. Durham (2007). Splice site prediction using stochastic regular grammars. Genet. Mol. Res. 6(1): 105-115.
 
About the Authors 
A.Y. Kashiwabara, D.C.G. VieiraA. Machado-Lima,A.M. Durham
 
Corresponding author
A.M. Durham
E-mail: alan@ime.usp.br 
 
ABSTRACT

This paper presents a novel approach to the problem of splice site prediction, by applying stochastic grammar inference. We used four grammar inference algorithms to infer 1465 grammars, and used 10-fold cross-validation to select the best grammar for each algorithm. The corresponding grammars were embedded into a classifier and used to run splice site prediction and compare the results with those of NNSPLICE, the predictor used by the Genie gene finder. We indicate possible paths to improve this performance by using Sakakibara’s windowing technique to find probability thresholds that will lower false-positive predictions.

Key words: Machine learning, Splice sites, Gene prediction, Stochastic grammars.

Back To Top