Introduction

Bioinformatics is the most well known and probably largest application of computational biology. The field developed to complement the growing area of genetics in the biological sciences.

History: What lead to the onset of bioinformatics?

In 1953 James Watson and Francis Crick correctly identified the structure of DNA, proposing that it is a winding double helix held together by pairs of bases (adenine paired with thymine and guanine paired with cytosine) Since that date the study of DNA became a predominent area of biology. Scientists believed that investigating genes (strands of DNA) would lead to a much broader understanding of the workings of the human body and mind. Specificaly, investigation of genes and their proteins provides information about cellular growth, communication, and organization leading to the understanding of the complex biological signals and pathways within each cell. One tremendous benefit of such studies occurs in the area of medicine. The identification of genes that mark inherited diseases could enable doctors to warn their patients before the disease*s onset, allowing affected individuals at risk for such ailments to take precousionary measures early in their lives. Eventually a deeper understanding of the workings of genes and proteins could allow scientists to re-engineer and replace such defective genes.

For most of the time following the onset of the investigation of DNA discovery was aimed at identifying one gene at a time. Within the last few years however, scientists have mapped or sequenced entire small organisms such yeast or bacteria. The map of the entire gentic makeup of an organism is called the genome. Many more complete genome sequences are soon to be available, including the later discussed human genome. To aid in the sequencing of such large quantities of genes a significant increase in three-dimensional protein structure data had to become available. Computational techniques and software development are required to access this information and allow researchers to use databases as research tools. Hence, the field of bioinformatics developed from the need to analyze large amounts of DNA sequences and protein structures.

What exactly is Bioinformatics?

The field involves development of new database methods to store genetic information, computational software methods to process it, applications that enable evaluation of experimental data, the improvement of molecular biological techniques to investigate genetic information, high-thoughput techniques to gather genetic information, and combinatorial chemistry. Information from molecules, protein sequencing, and X-ray crystallography is entered to specific databases. This information is organized to build an information infrastructure (a large database).

Bioinformatics is a "scientific discipline that encompasses all aspects of biological information acquisition, processing, storage, distribution, analysis and interpretation" that combines the tools of mathematics, computer science and biology with the aim of understanding the biological significance of a variety of data (NIH Publication No. 90-1590, April 1990). Powerful innovative software is combined with sophisticated database systems and automated biological research methods to create the science of bioinformatics. Another aspect of bioinformatics is the maintanance of such a large database.

The computational foundations of bioinformatics employ statistical analysis, algorithms, and database management systems. Searching a database of DNA or protein sequences relies on search algorithms to identify entries which share similarities with the query. The immense size and network of biological databases provides a resource to answer biological questions about mapping, gene patterns, molecular modeling, molecular evolution, and assistance in developing drugs targeted at fixing specific genes.

Thus, interaction with a biological database can lead to discovery through the analysis of existing data. Database interaction is defined as browsing, asking complex computational questions, submitting experimental data to public data banks, finding information about specific portions of a gene, executing complex queries, and organizing vast networks of information (Karp, 1996). Currently, there are four types of databases in both the public and private domain. A primary database contains one type of information such as DNA sequence data. Secondary databases contain that is exclusively derived from other databases. Specialist databases are called knowledge databases containing information from expert input, other databases, and literature. Integrated databases are clustered or merged primary or secondary databases (Baker and Brass, 1998). Considerable time and effort is currently being devoted to the up-dating and engineering of databases.

Glossary

DNA - winding double helix held together by pairs of bases (neuclodides)

Gene - strand of DNA involving many base pairs that corresponds to a particular trait in an organism.

Genome - all the genes that make up an organism