Research Article

Detection of Piwi-interacting RNAs based on sequence features

Published: May 13, 2016
Genet. Mol. Res. 15(2): gmr8638 DOI: 10.4238/gmr.15028638

Abstract

Piwi-interacting RNAs (piRNAs) are a class of small non-coding RNAs. Distinguishing piRNAs from other non-coding RNAs is important because of their important role in the physiological regulation of spermatogenesis, genome protection from transposons, and regulation of mRNAs and long non-coding RNAs. Few computational studies have addressed piRNAs detection, and both effectiveness and efficiency of piRNA detection tools require improvement. In this study, a piRNA detection method based on sequence features and a support vector machine was developed. Four types of features are proposed: weighted k-mer, weighted k-mer with wildcards, position-specific base, and piRNA length. The piRNA sequences from human, mouse, rat, and drosophila were respectively used in this experiment. Compared to existing algorithms, the proposed method provides a better balance between precision and sensitivity (both are approximately 90%), and although these values were slightly slower than those obtained using the piRNA annotation approach, the proposed method was four-fold faster than piRPred and 229-fold faster than piRNA predictor.

Piwi-interacting RNAs (piRNAs) are a class of small non-coding RNAs. Distinguishing piRNAs from other non-coding RNAs is important because of their important role in the physiological regulation of spermatogenesis, genome protection from transposons, and regulation of mRNAs and long non-coding RNAs. Few computational studies have addressed piRNAs detection, and both effectiveness and efficiency of piRNA detection tools require improvement. In this study, a piRNA detection method based on sequence features and a support vector machine was developed. Four types of features are proposed: weighted k-mer, weighted k-mer with wildcards, position-specific base, and piRNA length. The piRNA sequences from human, mouse, rat, and drosophila were respectively used in this experiment. Compared to existing algorithms, the proposed method provides a better balance between precision and sensitivity (both are approximately 90%), and although these values were slightly slower than those obtained using the piRNA annotation approach, the proposed method was four-fold faster than piRPred and 229-fold faster than piRNA predictor.