Effects of sample re-sequencing and trimming on the quality and size of assembled consensus sequences

F. Prosdocimi, D.A.O. Lopes, F.C. Peixoto, M.M. Mourão,L.G.G. Pacífico, R.A. Ribeiro, J.M. Ortega
Published October 05, 2007
Genet. Mol. Res. 6 (4): 756-765 (2007)

About the authors
F. Prosdocimi, D.A.O. Lopes, F.C. Peixoto, M.M. Mourão,L.G.G. Pacífico, R.A. Ribeiro, J.M. Ortega

Corresponding author
J.M. Ortega
E-mail: miguel@icb.ufmg.br

ABSTRACT

The production of nucleic acid sequences by automatic DNA sequencer machines is always associated with some base-calling errors. In order to produce a high-quality DNA sequence from a molecule of interest, researchers normally sequence the samesample many times. Considering base-calling errors as rare events, re-sequencing the same molecule and assembling the reads produced are frequently thought to be a good way to generate reliable sequences. However, a relevant question on this issue is: how many times the sample needs to be re-sequenced to minimize costs and achieve a high-fidelity sequence? We examined how both the number of resequenced reads and PHRED trimming parameters affect the accuracy and size of final consensus sequences. Hundreds of single-pool reaction pUC18 reads were generated and assembled into consensus sequences with CAP3 software. Using local alignment against the published pUC18 cloning vector sequence, the position and number of errors in the consensus were identified and stored in MySQL databases. Stringent PHRED trimming parameters proved to be efficient for the reduction of errors; however, this procedure also decreased consensus size. Moreover, re-sequencing did not have a clear effect on the removal of consensus errors, although it was able to slightly increase consensus.

Key words: Sequencing reads, Trimming, Assembling, Consensus, Codifying sequences, PHRED, CAP3

Back To Top