Link to Pubmed [PMID] – 22056699
Gene 2012 Jan;492(1):199-211
The proper detection of orthologs is crucial for evolutionary studies of genes and species. Despite large efforts to solve this problem the methodological situation appears unsettled to a large extent and the “quest for orthologs” is still an ongoing task in large-scale genome comparisons. Here, we introduce a simple operational framework for the detection of orthologs and their classification. The operational framework relies on well-established principles, optimizing their implementation for the considered purposes, and chaining components in coherent procedures: 1) We take advantage of the efficiency and simplicity of the Reciprocal Best Hit (RBH) detections, remedying (by design) the drawback concerning the limitations in terms of 1:1 detections. The procedure is based on the partitioning of Reciprocal Best Hits, with the further merging of partitions including members of the same paralogous classes (“SuperPartition of Orthologs” (SPOs)). 2) We then resort to the conservation profiles of the obtained clusters, allowing simple detection of SPOs containing duplicated members. Based on accepted evolutionary principles, such members can be further tagged as in-paralogs (co-orthologs) or out-paralogs. The method is illustrated and validated by extensive genomic analyses. The performances of the overall approach are characterized in global terms for three sets of species (Chlamydiae, Mycobacteria, Aspergilli), showing that at least 75% of the sets of orthologs contain at most one protein from a given species. The sets including more than one protein from a given species are shown to contain in-paralogs in proportions varying from 28% to 58%. The characterizations also show that the large majority of SPOs are associated with ancestral motifs, and accordingly not prone to chaining effects that might be triggered by multi-domain proteins. Further the SPO formulation is compared to other similarity based ortholog detection methods. Beyond core common results, significant differences are observed between various methods, which can be accounted for to a large extent on conceptual grounds, relative to the different merging schemes involved. Such comparisons highlight a major advantage of the SPO approach concerning the proper clustering of associated paralogs, which appear to be often dispatched spuriously into distinct orthologous classes. Finally the perspectives for future applications and elaborations of SPO-based compositional analyses are discussed.