Research Article

Testing the performance of automated annotation of ESTs with the Kegg Orthology (KO) database demonstrates lack of completeness of clusters

Published: September 30, 2008
Genet. Mol. Res. 7 (3) : 948-957 DOI: https://doi.org/10.4238/vol7-3x-meeting011
Cite this Article:
G.R. Fernandes, M.A. Mudado, J.M. Ortega (2008). Testing the performance of automated annotation of ESTs with the Kegg Orthology (KO) database demonstrates lack of completeness of clusters. Genet. Mol. Res. 7(3): 948-957. https://doi.org/10.4238/vol7-3x-meeting011
2,056 views

Abstract

The KEGG Orthology (KO) database was tested as a source for automated annotation of expressed sequence tags (ESTs). We used a control experiment where every EST was assigned to its cognate protein, and an annotation experiment where the ESTs were annotated by proteins from other organisms. Analyzing the results, we could assign classes to the annotation: correct, changed and speculated. The correct annotation ranged from 57 (Caenorhabditis elegans) to 81% (Homo sapiens). In spite of the changed annotation being low (1 in H. sapiens to 9% in Arabidopsis thaliana), the speculation was very high (18 in H. sapiens to 38% in C. elegans). We propose eliminating part of the speculated annotation using the KEGG Genes database to enrich KO clusters, decreasing the speculation from 38 to 2% in C. elegans. Thus, the KO database still demands some effort for moving sequences from Kegg GENES to KO, to complement the annotation performance.

The KEGG Orthology (KO) database was tested as a source for automated annotation of expressed sequence tags (ESTs). We used a control experiment where every EST was assigned to its cognate protein, and an annotation experiment where the ESTs were annotated by proteins from other organisms. Analyzing the results, we could assign classes to the annotation: correct, changed and speculated. The correct annotation ranged from 57 (Caenorhabditis elegans) to 81% (Homo sapiens). In spite of the changed annotation being low (1 in H. sapiens to 9% in Arabidopsis thaliana), the speculation was very high (18 in H. sapiens to 38% in C. elegans). We propose eliminating part of the speculated annotation using the KEGG Genes database to enrich KO clusters, decreasing the speculation from 38 to 2% in C. elegans. Thus, the KO database still demands some effort for moving sequences from Kegg GENES to KO, to complement the annotation performance.