About the project
Dates: spring/summer 2025 (3-6 months, flexible dates)
Research project title: Deep learning for time-calibration of pathogen phylogenies
Keywords: phylodynamics, molecular clock, deep learning, computer simulation, time-trees
Project description: Phylodynamics [Volz et al. PLoS Comput Biol. 2013] bridges the gap between traditional epidemiology and pathogen genomes, by inferring epidemiological parameters (such as the expected number of secondary infections Re) from pathogen transmission trees. The internal nodes of these trees represent pathogen transmissions from a donor to a recipient individual, while the leaves correspond to pathogen sampling events. These transmission trees are often approximated by time-scaled phylogenetic trees [Ho & Duchêne Mol Ecol. 2014], inferred from pathogen genomic sequences, which are sampled from infected individuals. In a pathogen phylogenetic tree the branches represent pathogen evolution and are measured in numbers of accumulated mutations (divided by the sequence size). Time-scaling combines the information coming from the tree branch lengths and the tip sampling dates to transform the phylogenetic tree so that its branches become measured in time. A time-scaled tree can answer such questions as when the epidemic started (the date of the root of the tree) and when certain transmissions happened (the dates of internal nodes).
Time-scaling is possible as many pathogens, especially viruses, quickly accumulate mutations between their transmissions, and because their mutation rate is roughly proportional to time (the strict molecular clock model [Ho & Duchêne Mol Ecol. 2014]). While many model-based tools are available for time-scaling trees under the strict and more complex clock models, this internship project’s goal will be to investigate whether appropriate clock model detection and time-scaling can be also achieved with deep learning, and how it can help in epidemic surveillance tasks. Deep learning is currently revolutionizing different aspects of pathogen phylodynamics (e.g., epidemiological parameter estimation [Voznica et al. Nat Commun. 2022]), providing almost instantaneous inference for large data sets and computationally intractable models.
Specific tasks:
- perform a literature overview on different molecular clock models
- generate simulated data sets for different molecular clock and pathogen evolution settings
- develop a deep-learning architecture for this data and the performance metrics and train the deep learning time-scaler
- apply the deep-learning time-scaler to real pathogen data sets
As part of this project, the student will develop skills in designing and implementing deep learning tools for analyzing pathogen sequence data, designing bioinformatics workflows and performing calculations on a computational cluster, as well as in scientific writing and presentation. In the long run, this project can be continued as a PhD project focussing on deep learning for phylodynamics.
About you
We are looking for a candidate who is fluent in English, has a training in deep learning, knows how to program in python (or another programming language) and is interested in applying their computational skills to epidemiology.
How to apply
To apply please send your CV and a motivation letter to anna.zhukova@pasteur.fr