Link to Pubmed [PMID] – 12832652
Mol. Biol. Evol. 2003 Nov;20(11):1754-9
It is a central assumption of evolution that gene duplications provide the genetic raw material from which to create proteins with new functions. The increasing availability in multigene family sequences that has resulted from genome projects has inspired the creation of novel in silico approaches to predict details of protein function. The underlying principle of all such approaches is to compare the evolutionary properties of homologous sequence positions in paralogous proteins. It has been proposed that the positions that show switches in substitution rate over time-i.e., “heterotachous sites,” are good indicators of functional divergence. Here, we analyzed the alpha and beta paralogous subunits of hemoglobin in search for such signatures. We found as many heterotachous sites in comparisons between groups of paralogous subunits (alpha/beta) as between orthologous ones (alpha/alpha, beta/beta). Thus, the importance of substitution rate shifts as predictors of specialization between protein subfamilies might be reconsidered. Instead, such shifts may reflect a more general process of protein evolution, consistent with the fact that they can be compatible with function conservation. As an alternative, we focused on those residues showing highly constrained states in two sequence groups, but different in each group, and we named them CBD (for “constant but different”). As opposed to heterotachous positions, CBD sites were markedly overrepresented in paralogous (alpha/beta) comparisons, as opposed to orthologous ones (alpha/alpha, beta/beta), identifying them as likely signatures of functional specialization between the two subunits. When superimposed onto the three-dimensional structure of hemoglobin, CBD positions consistently appeared to cluster preferentially on inter-subunit surfaces, two contact areas crucial to function in vertebrate tetrameric hemoglobin. The identification and analysis of CBD sites by complementing structural information with evolutionary data may represent a promising direction for future studies dealing with the functional characterization of a growing number of multigene families identified by complete genome analyses.