Link to Pubmed [PMID] – 30557340
PLoS Comput. Biol. 2018 12;14(12):e1006554
Early assessment of infectious disease outbreaks is key to implementing timely and effective control measures. In particular, rapidly recognising whether infected individuals stem from a single outbreak sustained by local transmission, or from repeated introductions, is crucial to adopt effective interventions. In this study, we introduce a new framework for combining several data streams, e.g. temporal, spatial and genetic data, to identify clusters of related cases of an infectious disease. Our method explicitly accounts for underreporting, and allows incorporating preexisting information about the disease, such as its serial interval, spatial kernel, and mutation rate. We define, for each data stream, a graph connecting all cases, with edges weighted by the corresponding pairwise distance between cases. Each graph is then pruned by removing distances greater than a given cutoff, defined based on preexisting information on the disease and assumptions on the reporting rate. The pruned graphs corresponding to different data streams are then merged by intersection to combine all data types; connected components define clusters of cases related for all types of data. Estimates of the reproduction number (the average number of secondary cases infected by an infectious individual in a large population), and the rate of importation of the disease into the population, are also derived. We test our approach on simulated data and illustrate it using data on dog rabies in Central African Republic. We show that the outbreak clusters identified using our method are consistent with structures previously identified by more complex, computationally intensive approaches.