Research Article

Benchmark comparison of ab initio microRNA identification methods and software

Published: December 19, 2012
Genet. Mol. Res. 11 (4) : 4525-4538 DOI: 10.4238/2012.October.17.4

Abstract

MicroRNAs (miRNAs) are short, non-coding RNA molecules that play an important role in the world of genes, especially in regulating the gene expression of target messenger RNAs through cleavage or translational repression of messenger RNA. Ab initio methods have become popular in computational miRNA detection. Most software tools are designed to distinguish miRNA precursors from pseudo-hairpins, but a few can mine miRNA from genome or expressed sequence tag sequences. We prepared novel testing datasets to measure and compare the performance of various software tools. Furthermore, we summarized the miRNA mining methods that study next-generation sequencing data for bioinformatics researchers who are analyzing these data. Because secondary structure is an important feature in the identification of miRNA, we analyzed the influence of various secondary structure prediction software tools on miRNA identification. MiPred was the most effective for classifying real-/pseudo-pre-miRNA sequences, and miRAbela performed relatively better for mining miRNA precursors from genome or expressed sequence tag sequences. RNA-fold performed better than m-fold for extracting secondary structure features of miRNA precursors.

MicroRNAs (miRNAs) are short, non-coding RNA molecules that play an important role in the world of genes, especially in regulating the gene expression of target messenger RNAs through cleavage or translational repression of messenger RNA. Ab initio methods have become popular in computational miRNA detection. Most software tools are designed to distinguish miRNA precursors from pseudo-hairpins, but a few can mine miRNA from genome or expressed sequence tag sequences. We prepared novel testing datasets to measure and compare the performance of various software tools. Furthermore, we summarized the miRNA mining methods that study next-generation sequencing data for bioinformatics researchers who are analyzing these data. Because secondary structure is an important feature in the identification of miRNA, we analyzed the influence of various secondary structure prediction software tools on miRNA identification. MiPred was the most effective for classifying real-/pseudo-pre-miRNA sequences, and miRAbela performed relatively better for mining miRNA precursors from genome or expressed sequence tag sequences. RNA-fold performed better than m-fold for extracting secondary structure features of miRNA precursors.