Linked-reads are a valuable technologies laying in between short reads and long reads. LR technologies allow the simulation of long reads using barcoded short reads.
But the chemistry of such technologies have a limitation due to the limited amount of barcodes.
Multiple molecules can share the same tag. So resulting short reads can’t be assigned to a precise molecule.
In this project we analyze underlying graph properties of the barcodes and molecules.
We proposed a totally new approach based on a new object called Local Clique Pair (LCP) to analyse the molecules from linked reads sequencing.
In a paper accepted to WABI 2020, we describe an algorithm to create the LCPs from sequencing data and a heuristic to order them reflecting the order of the molecules along simulated genomes.
These datastructures and algorithms answer 2 linked reads question:
- How many molecules are present in each barcode ?
- Can we construct a partial order of the molecules along pieces of the genome ?