Link to DOI – s41598-021-97867-3
Sci Rep 11, 18319 (2021).
Viruses that infect bacteria (phages) are increasingly recognized for their importance in diverse
ecosystems but identifying and annotating them in large‑scale sequence datasets is still challenging.
Although efficient scalable virus identification tools are emerging, defining the exact ends (termini)
of phage genomes is still particularly difficult. The proper identification of termini is crucial, as it helps
in characterizing the packaging mechanism of bacteriophages and provides information on various
aspects of phage biology. Here, we introduce PhageTermVirome (PTV) as a tool for the easy and rapid
high‑throughput determination of phage termini and packaging mechanisms using modern large‑
scale metagenomics datasets. We successfully tested the PTV algorithm on a mock virome dataset and
then used it on two real virome datasets to achieve the rapid identification of more than 100 phage
termini and packaging mechanisms, with just a few hours of computing time. Because PTV allows the
identification of free fully formed viral particles (by recognition of termini present only in encapsidated
DNA), it can also complement other virus identification softwares to predict the true viral origin
of contigs in viral metagenomics datasets. PTV is a novel and unique tool for high‑throughput
characterization of phage genomes, including phage termini identification and characterization of
genome packaging mechanisms. This software should help researchers better visualize, map and
study the virosphere. PTV is freely available for downloading and installation at https://gitlab.pasteur.fr/vlegrand/ptv