Computing and Genomics

Figure 1 - Sequence ladder of a genome
Source: http://en.wikipedia.org/wiki/DNA_sequencing

The advancement of computing has enabled genomics in many ways. For the pioneers of the field, genome sequencing alone was a laborious, time-consuming task that involved large amounts of toxic and radioactive chemicals. With the development of modern techniques and automated analysis, however, genomic specialists now depend on computers to quickly sequence and analyze genomes.

The growing computing power has opened the doors to vast sea of genomic information. Consider human genome, for example, which is said to contain all of a person’s hereditary information. The three billion base pairs of human genome correspond to about 755 megabytes of raw data. Before the introduction of computers, sequencing these three billion base pairs would have been an unthinkable task. However, thanks to newly developed sequencing methods and powerful computers, the Human Genome Project was able to complete the sequencing of a person’s genome in 2003. Still, sequencing was an arduous task, considering that it took eight years of combined effort of many universities and research centers around the world.

Computing power continues to play an important role in the field of genomics. As of 2011, the time it takes to sequence a person’s genome has gone down to as short as two days, and its price as low as $1,000. As more and more of genomics moves to the private sector, there is plenty of incentive to sequence and analyze genomes faster and cheaper, and aside from establishing new methods to sequence genome, the most straightforward way to accomplish this is to have more powerful computers.

Growing Role of Software

Increasing computing power, however, is only one of the ways in which computing enabled genomics. Software has played just as big of a role, and many experts believe that its role will only grow larger in the future of genomics.

The field of bioinformatics involves computationally analyzing genomic data and is the field in which software is likely to play a key role in the future. Because sequencing and analyzing genome is such a computation-heavy task, establishing a centralized, traditional supercomputer that is powerful enough for genomics research is in the order of hundreds of thousands of dollars, if not in the millions. Due to this high cost, an increasing number of specialists are opting for distributed (cloud) infrastructures that are just as powerful yet cheaper. With the new distributed systems, however, the specialists face an entirely new set of challenges that involves software.

Experts of bioinformatics infrastructure point out that job queueing, data distribution, networking and logging are some of the most critical issues that they face in implementing distributed systems and predict that most of these problems will be resolved through software improvements1.