The phylogenetic bootstrap was proposed by Joseph Felsenstein more than 30 years ago. This method, based on resampling and replications, is used extensively to assess the robustness of phylogenetic inferences. Its usefulness, simplicity and interpretability made it extremely popular in evolutionary studies, to the point that it is generally required for publication of phylogenies. Felsenstein’s article has been cited more than 35,000 times and is ranked in the top 100 of the most cited scientific papers of all time. In 2017, it was cited more than 2,000 times.
However, it is commonly acknowledged that Felsenstein’s bootstrap is not appropriate for large datasets containing hundreds or thousands of taxa, which are now common thanks to high-throughput sequencing technologies. While such datasets generally contain a lot of phylogenetic information, the Felsenstein’s bootstrap proportions (FBP) tend to be low, especially when the tree is inferred from a single gene, or only a few genes. The reason for such degradation is explained by the core methodology of Felsenstein’s bootstrap. A bootstrap branch must match exactly a branch in the original tree estimate, to be accounted for in the bootstrap support of that branch. A difference of just one taxon is sufficient for the
bootstrap branch to be counted absent, while it is nearly identical to the original branch. The standard approach is to remove “rogue” (phylogenetically unstable) taxa and relaunch the analysis, but this is statistically questionable and computationally expensive. Moreover, with large trees inferred branches are likely to have errors and a large fraction of taxa may be unstable.
We propose a new version of phylogenetic bootstrap, in which the presence of original branches in bootstrap trees is measured using a gradual “transfer” distance, as opposed to the original version using a binary presence/absence index. This distance is normalized in the [0, 1] range and averaged over all bootstrap trees. We so obtain the “transfer bootstrap expectation” (TBE), which replaces the branch presence frequency of FBP (i.e. the expectation of a 0/1 function), by the expectation of a nearly continuous function. By construction, TBE supports are necessarily higher than FBP’s and the difference is substantial for deep branches.
TBE computation and other phylogenetic tools are available from http://booster.c3bi.pasteur.fr . Currently, we are still working on the subject, to elucidate the mathematical bases of the transfer distance and investigate other branch-support approaches.