Accuracy of Human DNA Sequencing

The Human Genome Project was
culmination of combined efforts
from several different research groups including the National Human
Genome
Research Institute, the Department of Energy and the International
Human Genome
Sequencing Consortium1.
The
end goal of this project was to produce a sufficiently accurate version
of the
human genetic code. Our
DNA is composed
of 23 pairs of chromosomes which contains approximately 30,000 genes
which is
coded by sets of base pairs (either adenine [A], thymine[T],
cytosine[C], or
guanine [G]). All
in all, the human
genome contains approximately 3 billion base pairs.
Recent improvements regarding computational
analysis have drastically progressed the advancements of DNA sequencing. From a computing aspect,
each base pair could
be represented by a minimum of 2 bits, which would thus require over
750
megabytes (MB) to store the entire human genome2. But just how accurate is
DNA sequencing and
its data storage techniques? What effect do these inaccuracies have on
genomics
and their use in pharmacogenetics?

Throughout the course of the Human Genome Project, there have been varying levels of target accuracies that the research institutes have aimed for. In 2000, the first draft was released with an error rate of one error per every 1,000 base pairs. In 2003, the official results were cited to have an error rate of one per every 10,000 base pairs1. Currently, this requires going through and sequencing the DNA a total of ten times to achieve that level of accuracy3. Known as the Bermuda Standards, the international standard for accuracy is currently held at one error per 10,000 base pairs for the entire contiguous sequence – the DNA is sequenced in parts, and often times, gaps exist between these different parts4. Regardless of how accurate this process of sequencing may seem, through the sequencimg of the entire human genome, this yields a total of approximately 300,000 base pair errors.

But how significant is a 00.0001%
error rate? The
Human Genome Project has brought
attention to the significance of single nucleotide polymorphisms (SNPs). SNPs are natural DNA
sequencing variations of
a single nucleotide (A, T, C or G) that occur every 100 to 300 base
pairs5. The
variations caused by SNP can dramatically
affect how humans react differently to things such as drugs, vaccines,
or
diseases. However,
because of the
inherent and allowable errors for companies such as 23andMe that
sequence DNA,
their results will certainly sequence some SNPs inaccurately. The problem is that
companies like 23andMe
expect to use their DNA sequencing results to provide medical advice
for the
participants and their doctors so that they can better prescribe more
accurate
drug dosages. However,
with over 300,000
base pair errors, how accurate can this medical advice be? If the capabilities and
limitations of the
human body are sensitive down to the individual nucleotide (as with
SNP), can
human genome sequencing be reliable enough to serve its purpose as a
source for
personalized medical information completely dependent on human DNA?
References:
1) http://www.genome.gov/11006943
2) http://www.tmsoft.com/article-genome.html
3) http://www.ndsu.edu/pubweb/~mcclean/plsc431/students98/bennett.htm
4) http://www.nature.com/nature/journal/v429/n6990/full/nature02390.html
5) http://www.ndsu.edu/pubweb/~mcclean/plsc431/students99/symanietz.htm