One of the main tasks in the Human Genome Project is to find matches between two sequences or to compare them to see how they line up in the most ideal formation. For example, given the strands ATTCG and ATCG we would be able to find out which way the second sequence would be able to align with the second sequence the best. Obviously, here we see that the best way to match the sequences up would be by matching them up in the two possible combinations below.

ATTCG ATTCG
AT- CG A- TCG

With a "-" meaning a gap in the sequence. However, one of the goals in computational biology is to find an algorithm that would be able to determine whether or not sequences were the same in some sort of way and what would be the best match in terms of aligning one sequence with the other according to certain rules specified by the program. Those rules would probably be made by some biologist. One such algorithm with rules is as follows. The main rule in this algorithm is that there are no penalties for gaps left within the sequence. The steps to finding the best match are as follows.

Step 1. Put the two desired sequences in table formation, meaning that you put one of the sequences across and the other one in vertically on the left like so using the sequences ATCCGTT and ACTGTCT

1234567
ATCCGTT
1A
2C
3T
4G
5T
6C
7T


* A note on the matrix if we say the box (3, 2) we are reffering to the box on the 3rd row and 2nd column, which can be followed from the numbers given above or to the left of the sequences.

Step 2: On the first row mark a "1" if the corresponding row and column letter match and a "0" if they do not. For example:

1234567
ATCCGTT
1A1000000
2C
3T
4G
5T
6C
7T

Step 3: For the rest of the rows you have an equation for calculating the number that goes in each box, which is: Box Score for (x,y ) = previous best score for (d,e) (these conditions must be true about d and e: d < x and e < y) + (the number figured in step 2 for box (x, y)) For example.

1234567
ATCCGTT
1A1000000
2C0122111
3T
4G
5T
6C
7T

Note that the box (2, 1) is has a 0 in it while its counterpart on the left has a one even though neither column 1 nor 2 has a C in it to match row 2 yet box (2,2) has a "1" in it. This comes in as part of the equation. The previous best score for box (2,1) is 0, since none of the other boxes ((1,0) and (0,0)) exist and the in box (2,1) the C and A are not identical meaning that they get a zero. However, in the box (2,2) the previous best score which can only come from box (1,1), which has a score of "1" thus 1 + whatever the match is for box (2,2) which does not match (T and C are not the same letter) meaning 0. Thus, 1+0 = 1 the score printed in that box. Now for further reasoning let us look at box (2,3). The Previous best score which can come from either box (1,1) or (1,2) is also 1 and since the letters in box (2, 3) match that makes that side a 1 thus 1 + 1 = 2 the number in the previous score. This also occurs in box (2,4), but now let us explain why (2,5) has a 1. In box (2,5) the previous box scores are (1, 1-4) and the highest number in those boxes is again "1" so since the match isnšt there with G and C the score in the box becomes a "1". So the resulting matrix should look like this:

1234567
ATCCGTT
1A1000000
2C0122111
3T0212233
4G0122323
5T0222244
6C0133234
7T0223345

Step 3: Look for the highest score in the matrix and then trace back your sequence in such a way that you will find out the resulting alignment.

*Note the way to accomplish this is by looking for the highest score here which occurs at box (7,7) and then trace back in the lesser sqaures where there is that score ­ 1, which here we would be looking for a 4, and the only 4 that is in a lesser legitimate square is the one in box(5,6) notice that there are other 4šs (5,7), (6,7) and (7,6) but all those boxes are not less in both rows and columns. Thus we keep on taking any path that follows this pattern until we get to one or zero or the beginning of the sequence. By having links or pointers that would point to each of the letters that would be in the best matched sequence. However there may be times where we have more than one path that we could take. In this case since our algorithm does not handle this the computer will simply picked one based on the code written for it. Thus, the resulting sequence here would be aligned as follows.

A T C C - G T - T
A - - C T G T C T

This according to the computer and according to your matrix calculation would be the answer.