Bioinformatics

imDC: an ensemble learning method for imbalanced classification with miRNA data

C. Y. Wang, Hu, L. L., Guo, M. Z., Liu, X. Y., and Zou, Q., imDC: an ensemble learning method for imbalanced classification with miRNA data, vol. 14, pp. 123-133, 2015.

Imbalances typically exist in bioinformatics and are also common in other areas. A drawback of traditional machine learning methods is the relatively little attention given to small sample classification. Thus, we developed imDC, which uses an ensemble learning concept in combination with weights and sample misclassification information to effectively classify imbalanced data. Our method showed better results when compared to other algorithms with UCI machine learning datasets and microRNA data.

Molecular conservation of the mammalian leptin protein

J. E. Gabriel and Lidani, K. C. F., Molecular conservation of the mammalian leptin protein, vol. 14. pp. 253-258, 2015.

In this study, we comparatively assessed multiple sequences of the leptin protein from different animal species to establish new insights into conservation degree of biological sequences and evolutionary biology among mammals using computational biology tools.

Novel bioinformatic identification of differentially expressed tissue-specific and cancer-related proteins from the Human Protein Atlas for biomarker discovery

X. - X. Liu and Liu, F. - J., Novel bioinformatic identification of differentially expressed tissue-specific and cancer-related proteins from the Human Protein Atlas for biomarker discovery, vol. 14, pp. 4557-4565, 2015.

Identification of cancer-associated and tissue-specific proteins is important for research on carcinogenesis mechanisms and biomarker discovery. Here we performed a new strategy to identify candidate cancer proteins by mining immunohistochemistry protein profiles. Proteins with quantitative values from 14 normal tissues and their corresponding cancer tissues were compared and analyzed using bioinformatics.

MicroRNAs function primarily in the pathogenesis of human anencephaly via the mitogen-activated protein kinase signaling pathway

W. D. Zhang, Yu, X., Fu, X., Huang, S., Jin, S. J., Ning, Q., and Luo, X. P., MicroRNAs function primarily in the pathogenesis of human anencephaly via the mitogen-activated protein kinase signaling pathway, vol. 13, pp. 1015-1029, 2014.

Anencephaly is one of the most serious forms of neural tube defects (NTDs), a group of congenital central nervous system (CNS) malformations. MicroRNAs (miRNAs) are involved in diverse biological processes via the post-transcriptional regulation of target mRNAs. Although miRNAs play important roles in the development of mammalian CNS, their function in human NTDs remains unknown.

Bioinformatic analysis and characteristics of glycoprotein C encoded by the newly identified UL44 gene of duck plague virus

K. F. Sun, Cheng, A. C., and Wang, M. S., Bioinformatic analysis and characteristics of glycoprotein C encoded by the newly identified UL44 gene of duck plague virus, vol. 13, pp. 4505-4515, 2014.

Glycoprotein C is one of the duck plague virus (DPV) glycoproteins and is encoded by the DPV UL44 gene. DPV glycoprotein C (DPV-gC) comprises 431 amino acids with a putative molecular mass of 47.35 kDa. Sequence analysis indicated that the protein possesses typical characteristics of type-I membrane glycoproteins, containing an N-terminal signal sequence, an external domain, a C-terminal membrane anchor region, and a short cytoplasmic domain.

In silico analysis of mutations occurring in the protein N-acetylgalactosamine-6-sulfatase (GALNS) and causing mucopolysaccharidosis IVA

E. R. Tamarozzi, Torrieri, E., Semighini, E. P., and Giuliatti, S., In silico analysis of mutations occurring in the protein N-acetylgalactosamine-6-sulfatase (GALNS) and causing mucopolysaccharidosis IVA, vol. 13, pp. 10025-10034, 2014.

The goals were to analyze and characterize the secondary structure, regions of intrinsic disorder and physicochemical characteristics of three classes of mutations described in the enzyme N-acetylgalactosamine-6-sulfatase that cause mucopolysaccharidosis IVA: missense mutations, insertions and deletions. All mutations were compared to wild-type enzyme, and the results showed that with 25 of 129 missense mutations secondary structure was maintained and that 104 mutations showed minor changes, such as an increase or decrease in the size of the elements.

Identification of differently expressed genes in leukemia using multiple microarray datasets

Z. Y. Zhang, Xu, R. Q., Guo, T. J., Zhang, M., Li, D. X., and Lu, X. Y., Identification of differently expressed genes in leukemia using multiple microarray datasets, vol. 13, pp. 10482-10489, 2014.

The purpose of this study was to identify differentially expressed genes and analyze biological processes related to leukemia. A meta-analysis was performed using the Rank Product package of Gene Expression Omnibus datasets for leukemia. Next, Gene Ontology-enrichment analysis and pathway analysis were performed using the Gene Ontology website and Kyoto Encyclopedia of Genes and Genomes. A protein-protein interaction network was constructed using the Cytoscape software.

CoffeebEST: an integrated resource for Coffea spp expressed sequence tags

A. R. Paschoal, Fernandes, E. D. M., Silva, J. C., Lopes, F. M., Pereira, L. F. P., and Domingues, D. S., CoffeebEST: an integrated resource for Coffea spp expressed sequence tags, vol. 13, pp. 10913-10920, 2014.

Coffee is one of the most important commodities in the world, and its production relies mainly on two species, Coffea arabica and Coffea canephora. Although there are diverse transcriptome datasets available for coffee trees, few research groups have exploited the potential knowledge contained in these data, especially with respect to fruit and seed development.

Improved method for predicting protein fold patterns with ensemble classifiers

W. Chen, Liu, X., Huang, Y., Jiang, Y., Zou, Q., and Lin, C., Improved method for predicting protein fold patterns with ensemble classifiers, vol. 11, pp. 174-181, 2012.

Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix.

New microsatellite markers for the abalone Haliotis midae developed by 454 pyrosequencing and in silico analyses

R. Slabbert, Hepple, J. - A., Rhode, C., Van der Merwe, A. E. Bester-, and Roodt-Wilding, R., New microsatellite markers for the abalone Haliotis midae developed by 454 pyrosequencing and in silico analyses, vol. 11, pp. 2769-2779, 2012.

Farming of Haliotis midae is the most lucrative aquaculture venture in South Africa. The genome of this species needs to be studied to assist in selective breeding programs aimed at increasing overall yield, and molecular markers will be required to attain this goal. We identified and characterized 82 polymorphic microsatellite loci by using repeat-enriched genomic libraries and high-throughput pyrosequencing technology.

Pages

Subscribe to Bioinformatics