The huge amount of molecular data available nowadays can help addressing new and essential questions in evolution. However, reconstructing evolution requires models, algorithms, and statistical and computational methods of ever increasing complexity. Developing methods that scale with the “deluge” of data is a real challenge, in terms of algorithmics but also modeling. Out unit aims at developing new methodologies and algorithms that are able to tackle efficiently these challenges, in the field of evolution and molecular phylogeny.
Our second aim is to apply these methods and tools to pathogens, mostly viruses, and especially HIV. The goals are multiple: understand their evolution (e.g. the emergence and transmission of drug resistance mutations), decipher their genome (e.g. to confirm the existence of the 10th gene of HIV), design surveillance tools (e.g. to control outbreaks). Most of the current methods to tackle these questions are based on Bayesian approaches, which are computationally heavy and not able to process the large data sets available nowadays. To overcome these limitations we are working on new maximum-likelihood and approximate bayesian methods (ABC) applicable to very large data sets comprising several dozens of thousands of sequences.