Research Article

De novo assembly and characterization of Gleditsia sinensis transcriptome and subsequent gene identification and SSR mining

Published: January 29, 2016
Genet. Mol. Res. 15(1): gmr7740 DOI: 10.4238/gmr.15017740

Abstract

Gleditsia sinensis is a Chinese native deciduous tree with a high economic and medicinal value. However, there is limited knowledge on the molecular processes responsible for the medical properties of this species owing to lack of bioinformatic resources such as available whole-genome sequences. In the present study, RNA sequencing data were used to analyze the transcriptome of G. sinensis, and a series of bioinformatic tools was used to explore the main genes involved in important molecular processes. A total of 75.57 million paired-end reads, with a length of 101 bp, were acquired from G. sinensis. Using the assembly tool Trinity, 233,751 transcripts were discovered. Among these, 85,795 were identified as unique transcripts and 59,326 unique transcripts were found to contain coding regions. Gene ontology analysis identified 27,637 unique transcripts that were clustered into 56 functional groups. Genes involved in flavonoid and terpenoid backbone biosynthesis and those encoding transcription factors were further analyzed. Sequence analysis revealed four putative G. sinensis chalcone isomerase genes (GsCHI) encoding the enzymes for flavonoid biosynthesis. GsCHI1 was found to be phylogenetically related to the chalcone isomerase of the family Leguminosae, and its transcript levels in different tissues were higher than those of GsCHI2, GsCHI3, and GsCHI4. Furthermore, 15,014 simple sequence repeat (SSR) markers were discovered in the transcript library, and 5170 primers were generated for the SSR loci. The genetic and genomic information presented in this study will be helpful for future studies on gene discovery and molecular processes in G. sinensis.

Gleditsia sinensis is a Chinese native deciduous tree with a high economic and medicinal value. However, there is limited knowledge on the molecular processes responsible for the medical properties of this species owing to lack of bioinformatic resources such as available whole-genome sequences. In the present study, RNA sequencing data were used to analyze the transcriptome of G. sinensis, and a series of bioinformatic tools was used to explore the main genes involved in important molecular processes. A total of 75.57 million paired-end reads, with a length of 101 bp, were acquired from G. sinensis. Using the assembly tool Trinity, 233,751 transcripts were discovered. Among these, 85,795 were identified as unique transcripts and 59,326 unique transcripts were found to contain coding regions. Gene ontology analysis identified 27,637 unique transcripts that were clustered into 56 functional groups. Genes involved in flavonoid and terpenoid backbone biosynthesis and those encoding transcription factors were further analyzed. Sequence analysis revealed four putative G. sinensis chalcone isomerase genes (GsCHI) encoding the enzymes for flavonoid biosynthesis. GsCHI1 was found to be phylogenetically related to the chalcone isomerase of the family Leguminosae, and its transcript levels in different tissues were higher than those of GsCHI2, GsCHI3, and GsCHI4. Furthermore, 15,014 simple sequence repeat (SSR) markers were discovered in the transcript library, and 5170 primers were generated for the SSR loci. The genetic and genomic information presented in this study will be helpful for future studies on gene discovery and molecular processes in G. sinensis.