Research Article

A novel technique for analyzing the similarity and dissimilarity of DNA sequences

Published: January 28, 2014
Genet. Mol. Res. 13 (1) : 570-577 DOI: 10.4238/2014.January.28.2

Abstract

li,j denotes the distance between the point (xi, yi) and the point (xj, yi) in graphical representation. By classifying li,j, i, j = 1, 2,…, N according to the number of points between (xi, yi) and (xj, yi), N - 1 types are obtained. The average and variance of every type are assembled by the novel invariant v = (a1, d1, a2, d2,…, aN, d>N). Compared with the traditional invariants, the leading eigenvalue, the max-min (eigenvalue), the leading eigenvalue/N, the average matrix element, and the average row sum, this strategy complies with the rule of using the average, extracts more information about biological sequences, and reduces the amounts of computation. It is superior to the traditional invariants in predicting similarity and dissimilarity among different species.

li,j denotes the distance between the point (xi, yi) and the point (xj, yi) in graphical representation. By classifying li,j, i, j = 1, 2,…, N according to the number of points between (xi, yi) and (xj, yi), N - 1 types are obtained. The average and variance of every type are assembled by the novel invariant v = (a1, d1, a2, d2,…, aN, d>N). Compared with the traditional invariants, the leading eigenvalue, the max-min (eigenvalue), the leading eigenvalue/N, the average matrix element, and the average row sum, this strategy complies with the rule of using the average, extracts more information about biological sequences, and reduces the amounts of computation. It is superior to the traditional invariants in predicting similarity and dissimilarity among different species.

About the Authors