Lien vers HAL – pasteur-05133401
Lien DOI – 10.1016/j.eswa.2025.128796
Expert Systems with Applications, 2025, 294, pp.128796. ⟨10.1016/j.eswa.2025.128796⟩
Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has transformed microbiology by enabling rapid and cost-effective pathogen identification. However, differentiating between closely related species remains challenging. Current analytical approaches, such as database-driven approaches, still underutilize the information contained in mass spectra. In response to this challenge, we developed MSclassifR, an R package designed to facilitate the construction of data analysis pipelines for accurate mass spectra classification using machine learning (ML) techniques. MSclassifR provides end-to-end pipelines tailored for microbiological diagnostics, covering preprocessing, mass-to-charge (m/z) selection, and classification of mass spectra. One of the main strengths of the package is its m/z selection method based on random forest (RF) variable importance, which improves identification accuracy by focusing on the most informative spectral features. We assessed classification pipelines constructed using MSclassifR through rigorous experiments conducted on diverse datasets that included bacterial species or subspecies and virulent/avirulent phenotypes, illustrating the package’s versatility across diverse applications. Moreover, the pipelines achieved high accuracy when applied to SARS-CoV-2 nasal swab mass spectra acquired using various MALDI-TOF instruments. Comparisons of multiple pipelines revealed that RF-based pipelines achieved the best performances on various-sized datasets. The MSclassifR package offers microbiologists an open-source solution that leverages ML to enhance MALDI-TOF MS’s diagnostic capabilities. It is available for download from the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/MSclassifR/).