Link to Pubmed [PMID] – 9278068
J. Comput. Biol. 1997;4(3):415-31
Many multiple alignment methods implicitly or explicitly try to minimize the amount of biological change implied by an alignment. At the level of sequences, biological change is measured along a phylogenetic tree, a structure frequently being predicted only after the multiple alignment instead of together with it. The Generalized Tree Alignment problem addresses both questions simultaneously. It can formally be viewed as a Steiner tree problem in sequence space and our approach merges a path heuristic for the construction of a Steiner tree with a clustering method as usually applied only to distance data. This combination is achieved using sequence graphs, a data structure for efficient representation of similar sequences. Although somewhat slower in practice than an earlier method by Hein (1989) the current approach achieves significantly better results in terms of the underlying scoring function. Furthermore, a variant of the algorithm is introduced that maintains a guaranteed error bound of (2 – 2/n) for n sequences.