Link to Pubmed [PMID] – 12935335
J. Comput. Biol. 2003;10(3-4):385-98
Most shotgun sequencing projects undergo a long and costly phase of finishing, in which a partial assembly forms several contigs whose order, orientation, and relative distance is unknown. We propose here a new technique that supplements the shotgun assembly data by experimentally simple and commonly used complete restriction digests of the target. By computationally combining information from the contig sequences and the fragment sizes measured for several different enzymes, we seek to form a “scaffold” on which the contigs will be placed in their correct orientation, order, and distance. We give a heuristic search algorithm for solving the problem and report on promising preliminary simulation results. The key to the success of the search scheme is the very rapid solution of two time-critical subproblems that are solved to optimality in linear time. Our simulations indicate that with noise levels of some 3% relative error in measuring fragment sizes, using six enzymes, most datasets of 13 contigs spanning 300kb can be correctly ordered, and the remaining ones have most of their pairs of neighboring contigs correct. Hence, the technique has a potential to provide real help to finishing. Even without closing all gaps, the ability to order and orient the contigs correctly makes the partial assembly both more accessible and more useful for biologists.