Link to Pubmed [PMID] – 11847563
J. Mol. Evol. 2002 Mar;54(3):376-85
When divergence between viral species is large, the analysis and comparison of nucleotide or protein sequences are dependent on mutation biases and multiple substitutions per site leading, among other things, to the underestimation of branch lengths in phylogenetic trees. To avoid the problem of multiply substituted sites, a method not directly based on the nucleic or protein sequences has been applied to retroviruses. It consisted of asking questions about genome structure or organization, and gene function, the series of answers creating coded sequences analyzed by phylogenic software. This method recovered the principal retroviral groups such as the lentiviruses and spumaviruses and highlighted questions and answers characteristic of each group of retroviruses. In general, there was reasonable concordance between the coded genome methodology and that based on conventional phylogeny of the integrase protein sequence, indicating that integrase was fixing mutations slowly enough to marginalize the problem of multiple substitutions at sites. To a first approximation, this suggests that the acquisition of novel genetic features generally parallels the fixation of amino acid substitutions.