Using the Gene Ontology tool to produce de novo protein-protein interaction networks with IS_A relationship.
Since the first assembled genomes, gene sequences alone have not been sufficient to understand complex metabolic processes involving several genes, each playing distinct roles. To identify their roles, a network of interactions, wherein each gene is a node, should be created. Edges connecting nodes are evidence of interaction, for instance, of gene products coexisting in the same cellular component. Such interaction networks are called protein-protein interactions (PPIs). After genome assembling, PPI mapping is used to predict the possibility of proteins interacting with other proteins based on literature evidence and several databases, thus enriching genome annotations. Identifying PPIs involves analyzing each possible protein pair for a set of features, for instance, participation in the same biological process and having the same function and status in a cellular component. Here, we investigated using the three categories of the Gene Ontology (GO) database for efficient PPI prediction, because it provides data about the three features exemplified here. For a broader conclusion, we investigated the genomes of ten different human pathogens, looking for commonality regarding the GO hierarchical relationship-denominated IS_A. The plasmids were examined separately from their main genomes. Protein pairs sharing at least one IS_A value were considered as interacting proteins. STRING results certified the probed interactions as sensitivity (score >0.75) and specificity (score <0.25) analysis. The average areas under the receiver operating characteristic curve for all organisms were 0.66 and 0.53 for their genomes and plasmids, respectively. Thus, GO categories alone could not potentially provide reliable PPI prediction. However, using additional features can improve predictions.