Sequence Reconstruction Algorithm
f1=GCGC
f2=CGCC
f3=GGCG
The shortest common superstring is:
| . | G | G | C | G | C | C |
| f1= | . | G | C | G | C | |
| f2= | . | . | C | G | C | C |
| f3= | G | G | C | G |
The shortest superstring problem can be displayed as a Hamiltonian path and shown to be equivalent to the travelling salesman problem. It is NP-complete. A greedy algorithm exists which sequentially merges fragments starting with the pair with the most overlap first.
Greedy algorithm for shortest superstring
Let T be the set of all fragments and let S be an empty set.
While T is not empty, do
{
For the pair s,t in T with maximum overlap (s=t is allowed)
{
If s is different from t, merge s and t.
If s = t, remove s from T and add s to S.
}
}
Output the concatenation of the elements of S.
Sources:
Waterman, Michael. Introduction to Computational Biology. London: Chapman & Hall (1995).