Sequence Reconstruction Algorithm
f1=GCGC
f2=CGCC
f3=GGCG
The shortest common superstring is:
. | G | G | C | G | C | C |
f1= | . | G | C | G | C | |
f2= | . | . | C | G | C | C |
f3= | G | G | C | G |
The shortest superstring problem can be displayed as a Hamiltonian path and shown to be equivalent to the travelling salesman problem. It is NP-complete. A greedy algorithm exists which sequentially merges fragments starting with the pair with the most overlap first.
Greedy algorithm for shortest superstring
Let T be the set of all fragments and let S be an empty set. While T is not empty, do { For the pair s,t in T with maximum overlap (s=t is allowed) { If s is different from t, merge s and t. If s = t, remove s from T and add s to S. } } Output the concatenation of the elements of S.
Sources:
Waterman, Michael. Introduction to Computational Biology. London: Chapman & Hall (1995).