While FastA is actually just a fast approximation to the Smith-Waterman algorithm, it is slower and more sensitive than the BLAST algorithm because FastA tolerates gaps in the aligned sequences.
In the first step of this search, the comparison can be viewed as a set of many dot plots with the query sequence on the vertical axis and each sequence in the database on the horizontal axis of its particular plot.
Set a word size. (A word is a short sequence of nucleotides. A word
of size 2 could be, for instance, gg.) Place a dot wherever words of this size
For example, if the query sequence were ggctttcgg and the database sequence were aacggcttacg, you would get the following plot:
In the second step, rescore the 10 best diagonals using a scoring matrix that allows and takes into account conservative replacements and ambiguity symbols shorter than the size of a word. This analysis finds high scoring subregions within the diagonals, called "initial regions."
In step three, take the initial regions whose scores are above a predetermined threshold and check to see if they can be joined together. Impose a penalty on joined regions so that joined pieces have lower scores than continuous runs.
Finally, use a variant of the last part of the Smith-Waterman algorithm to align the sequence and calculate the optimal score.
If the optimal score is above a certain threshold, place the sequence in the match list.