Accuracy of Human DNA Sequencing

The Human Genome Project was culmination of combined efforts from several different research groups including the National Human Genome Research Institute, the Department of Energy and the International Human Genome Sequencing Consortium¹. The end goal of this project was to produce a sufficiently accurate version of the human genetic code. Our DNA is composed of 23 pairs of chromosomes which contains approximately 30,000 genes which is coded by sets of base pairs (either adenine [A], thymine[T], cytosine[C], or guanine [G]). All in all, the human genome contains approximately 3 billion base pairs. Recent improvements regarding computational analysis have drastically progressed the advancements of DNA sequencing. From a computing aspect, each base pair could be represented by a minimum of 2 bits, which would thus require over 750 megabytes (MB) to store the entire human genome². But just how accurate is DNA sequencing and its data storage techniques? What effect do these inaccuracies have on genomics and their use in pharmacogenetics?

Throughout the course of the Human Genome Project, there have been varying levels of target accuracies that the research institutes have aimed for. In 2000, the first draft was released with an error rate of one error per every 1,000 base pairs. In 2003, the official results were cited to have an error rate of one per every 10,000 base pairs¹. Currently, this requires going through and sequencing the DNA a total of ten times to achieve that level of accuracy³. Known as the Bermuda Standards, the international standard for accuracy is currently held at one error per 10,000 base pairs for the entire contiguous sequence – the DNA is sequenced in parts, and often times, gaps exist between these different parts⁴. Regardless of how accurate this process of sequencing may seem, through the sequencimg of the entire human genome, this yields a total of approximately 300,000 base pair errors.

But how significant is a 00.0001% error rate? The Human Genome Project has brought attention to the significance of single nucleotide polymorphisms (SNPs). SNPs are natural DNA sequencing variations of a single nucleotide (A, T, C or G) that occur every 100 to 300 base pairs⁵. The variations caused by SNP can dramatically affect how humans react differently to things such as drugs, vaccines, or diseases. However, because of the inherent and allowable errors for companies such as 23andMe that sequence DNA, their results will certainly sequence some SNPs inaccurately. The problem is that companies like 23andMe expect to use their DNA sequencing results to provide medical advice for the participants and their doctors so that they can better prescribe more accurate drug dosages. However, with over 300,000 base pair errors, how accurate can this medical advice be? If the capabilities and limitations of the human body are sensitive down to the individual nucleotide (as with SNP), can human genome sequencing be reliable enough to serve its purpose as a source for personalized medical information completely dependent on human DNA?

References:

1) http://www.genome.gov/11006943

2) http://www.tmsoft.com/article-genome.html

3) http://www.ndsu.edu/pubweb/~mcclean/plsc431/students98/bennett.htm

4) http://www.nature.com/nature/journal/v429/n6990/full/nature02390.html

5) http://www.ndsu.edu/pubweb/~mcclean/plsc431/students99/symanietz.htm