The COVID-19 pandemic has abundantly illustrated the importance of quickly identifying the factors that increase or decrease pathogen transmission. An early understanding of the airborne nature of SARS-CoV-2 transmission, for example, could have helped slow down the pandemic and reduce its human burden. In this project, we will explore the potential of natural language processing (NLP) methods to automatically identify risk factors and protective factors by analyzing large corpora of unstructured text. This project takes advantage of the unique resource provided by the ComCor study, which contains free text descriptions of suspected circumstances of contamination from >50,000 individuals tested positive for COVID-19, totaling well over 500,000 words. The proposed project will mobilize state-of-the-art NLP techniques based on deep learning, such as contextualized word embeddings learned by transformer models (e.g. BERT). Methods developed in this project will be tested and validated primarily (but not exclusively) on the ComCor data set in collaboration with the Epidemiology of Emerging Disease Unit of A. Fontanet.
Once validated, we will also explore applications of these methods to other text data sets, including social media and data obtained in the framework of the AIOLOS project. AIOLOS is a French-German research program, that aims at supporting decision-making by integrating information from various sources to detect early signs of a new outbreak, monitor its spread and derive appropriate response measures. A total of 17 partners (5 main partners, including Sanofi and Fraunhofer and 12 associated partners, including Institut Pasteur) are involved in this project.
We are looking for a highly motivated candidate with a solid background in NLP or related fields to lead this project.
Please send applications (motivation letter + CV with names of 2-3 references) to Ch. Zimmer (firstname.lastname@example.org) and Tiffany Charmet (email@example.com)