Link to DOI – 10.4230/LIPIcs.WABI.2020.11
20th International Workshop on Algorithms in Bioinformatics (WABI 2020)
Considering a set of intervals on the real line, an interval graph records these intervals as nodesand their intersections as edges. Identifying (i.e. merging) pairs of nodes in an interval graphresults in a multiple-interval graph. Given only the nodes and the edges of the multiple-intervalgraph without knowing the underlying intervals, we are interested in the following questions. Canone determine how many intervals correspond to each node? Can one compute a walk over themultiple-interval graph nodes that reflects the ordering of the original intervals? These questionsare closely related to linked-read DNA sequencing, where barcodes are assigned to long moleculeswhose intersection graph forms an interval graph. Each barcode may correspond to multiplemolecules, which complicates downstream analysis, and corresponds to the identification of nodes ofthe corresponding interval graph. Resolving the above graph-theoretic problems would facilitateanalyses of linked-reads sequencing data, through enabling the conceptual separation of barcodesinto molecules and providing, through the molecules order, a skeleton for accurately assembling thegenome. Here, we propose a framework that takes as input an arbitrary intersection graph (suchas an overlap graph of barcodes) and constructs a heuristic approximation of the ordering of theoriginal intervals