The Phred Algorithm

According to its documentation, "Phred reads DNA sequencer trace data, calls bases, assigns quality values to the bases, and writes the base calls and quality values to output files." It was developed by Phil Green and Brent Ewing. Phred is capable of reading trace data from various machines. After reading, it writes the sequences in either FASTA format or the SCF format, along with quality values for the bases, which can be used by the phrap sequence assembly program to increase the accuracy of the assembled sequence.

1. Predict peaks
First, phred predicts where the peaks on the gel would be centered if there were no factors (compressions, dropouts, etc.) pushing them from their most basically predicted location. It utilizes simple Fourier methods on the four base traces to do this.

2. Determine actual peaks and match with predicted
Second, phred determines the center of each peak and the areas of the peaks as compared with neighboring peaks. A dynamic programming algorithm matches these peaks with the peak locations predicted in the first step.

3. Calculate quality of base calls
Based on certain trace characteristics, Phred calculates the probability (p) of an error in the base call at each position. Using the relation
q = - 10 log p ,
it converts p to a quality value q. For example, a quality value of 40 relates to an error probability of 1/10,000.

Phred works in close relation with phrap.

Sources: Viewed 9-16-00.
Bishop, Martin (ed.). Guide to Human Genome Computing. 2nd ed. San Diego: Academic Press (1998).