Alignment and Dynamic Programming


Intuitively, alignment seems like a simple concept, but lets formalize it a little (see avery.rutgers.edu/WSSP/StudentScholars/Session15/Session15.html). To align two DNA sequences, we must first make them the same length. We do this by adding gaps to one. Then, we put the two sequences on top of each other and remove any gaps that overlap. Now, an alignment has been accomplished--but only one of many possible alignments, as the gaps could have been added anywhere in the sequence.

How do we determine which alignment is best? We do this by assigning a score to possible alignments. The overall score for a particular alignment is determined by adding together scores for each of individual pairings that comprise the alignment. The individual scores can be determined according to any scheme you think reasonable. For example, the sample program at bioweb.ncsa.uiuc.edu/~bioph490/workshop/Dynamic_Programming/dptable.cgi uses a default of -4 for a letter matched with a gap, -4 for a letter matched with a different letter, and +5 for letter matched with the same letter. Alternatively, a scheme based on PAM or BLOSUM matrices could be used.

Thus, with a score for each alignment, we can decide which is best. There may be more than one alignment with the best score, so we'll just choose one arbitrarily. Keep in mind, this discussion of alignment considers what is mathematically optimal, not what is biologically true per say.

All well and good, but to consider each and every possible alignment would require immense computation for longer sequences. In 1970, Needleman and Wunsch introduced a method that creates an optimum alignment in a less computationally difficult way. They turned to dynamic programming, a technique that breaks a problem into simpler versions and uses the solutions to those to solve the bigger one.

Turn to an N.C.S.A. site for an excellent discussion and demonstration.

Local alignment, in addition to global, is possible with dynamic programming. In global alignment, all the letters in both sequences are included, making it useful for sequences that are highly similar. Global alignment is not good, however, for two sequences that are not similar over their whole lengths, but have stretches within them that are similar. Whereas global alignment may miss these, local alignment will not.




Sources:
avery.rutgers.edu/WSSP/StudentScholars/Session15/Session15.html. Viewed 9-17-00.
bioweb.ncsa.uiuc.edu/~bioph490/workshop/Dynamic_Programming/. Viewed 9-17-00.