Search anything and hit enter
  • Teams
  • Members
  • Projects
  • Events
  • Calls
  • Jobs
  • publications
  • Software
  • Tools
  • Network
  • Equipment

A little guide for advanced search:

  • Tip 1. You can use quotes "" to search for an exact expression.
    Example: "cell division"
  • Tip 2. You can use + symbol to restrict results containing all words.
    Example: +cell +stem
  • Tip 3. You can use + and - symbols to force inclusion or exclusion of specific words.
    Example: +cell -stem
e.g. searching for members in projects tagged cancer
Search for
Count
IN
OUT
Content 1
  • member
  • team
  • department
  • center
  • program_project
  • nrc
  • whocc
  • project
  • software
  • tool
  • patent
  • Administrative Staff
  • Assistant Professor
  • Associate Professor
  • Clinical Research Assistant
  • Clinical Research Nurse
  • Clinician Researcher
  • Department Manager
  • Dual-education Student
  • Full Professor
  • Honorary Professor
  • Lab assistant
  • Master Student
  • Non-permanent Researcher
  • Nursing Staff
  • Permanent Researcher
  • Pharmacist
  • PhD Student
  • Physician
  • Post-doc
  • Prize
  • Project Manager
  • Research Associate
  • Research Engineer
  • Retired scientist
  • Technician
  • Undergraduate Student
  • Veterinary
  • Visiting Scientist
  • Deputy Director of Center
  • Deputy Director of Department
  • Deputy Director of National Reference Center
  • Deputy Head of Facility
  • Director of Center
  • Director of Department
  • Director of Institute
  • Director of National Reference Center
  • Group Leader
  • Head of Facility
  • Head of Operations
  • Head of Structure
  • Honorary President of the Departement
  • Labex Coordinator
Content 2
  • member
  • team
  • department
  • center
  • program_project
  • nrc
  • whocc
  • project
  • software
  • tool
  • patent
  • Administrative Staff
  • Assistant Professor
  • Associate Professor
  • Clinical Research Assistant
  • Clinical Research Nurse
  • Clinician Researcher
  • Department Manager
  • Dual-education Student
  • Full Professor
  • Honorary Professor
  • Lab assistant
  • Master Student
  • Non-permanent Researcher
  • Nursing Staff
  • Permanent Researcher
  • Pharmacist
  • PhD Student
  • Physician
  • Post-doc
  • Prize
  • Project Manager
  • Research Associate
  • Research Engineer
  • Retired scientist
  • Technician
  • Undergraduate Student
  • Veterinary
  • Visiting Scientist
  • Deputy Director of Center
  • Deputy Director of Department
  • Deputy Director of National Reference Center
  • Deputy Head of Facility
  • Director of Center
  • Director of Department
  • Director of Institute
  • Director of National Reference Center
  • Group Leader
  • Head of Facility
  • Head of Operations
  • Head of Structure
  • Honorary President of the Departement
  • Labex Coordinator
Search

← Go to Research

Go back
Scroll to top
Share
© Research
Publication : iScience

The Backpack Quotient Filter: a dynamic and space-efficient data structure for querying k -mers with abundance

Scientific Fields
Diseases
Organisms
Applications
Technique

Published in iScience - 20 Feb 2024

Victor Levallois, Francesco Andreace, Bertrand Le Gal, Yoann Dufresne, Pierre Peterlongo

Link to HAL – pasteur-04844927

Link to DOI – 10.1016/j.isci.2024.111435

iScience, 2024, pp.111435. ⟨10.1016/j.isci.2024.111435⟩

Genomic data sequencing has become indispensable for elucidating the complexities of biological systems. As databases storing genomic information, such as the European Nucleotide Archive, continue to grow exponentially, efficient solutions for data manipulation are imperative. One fundamental operation that remains challenging is querying these databases to determine the presence or absence of specific sequences and their abundance within datasets. This paper introduces a novel data structure indexing k -mers (substrings of length k ), the Backpack Quotient Filter (BQF), which serves as an alternative to the Counting Quotient Filter (CQF). The BQF offers enhanced space efficiency compared to the CQF while retaining key properties, including abundance information and dynamicity, with a negligible false positive rate, below 10 -5 %. The approach involves a redefinition of how abundance information is handled within the structure, alongside with an independent strategy for space efficiency. We show that the BQF uses 4x less space than the CQF on some of the most complex data to index: sea-water metagenomics sequences. Furthermore, we show that space efficiency increases as the amount of data to be indexed increases, which is in line with the original objective of scaling to ever-larger datasets.