The Maximal Clique Problem
Although many theoretical papers have been published on DNA computing
since Adleman’s first crude demonstration in 1994, it would be over
two years, in 1997, until another NP complete problem would be solved.
This problem was the Maximal Clique Problem: given a group of vertices
some of which have edges in between them, the maximal clique is the largest
subset of vertices in which each point is directly connected to every
other vertex in the subset. Every time a new point is added, the number
of total cliques that must be searched at least doubles; hence we have
an exponentially growing problem. Once again, researches sought to take
advantage of DNA’s high level parallelism which would essentially
allow all the possible paths to be calculated simultaneously. After all
the possible cliques were constructed, the scientists would simply need
to fish out the largest clique. However, like many DNA experiments, each
possible “choice” needed a unique DNA strand. This is a problem
because, as noted early, the number of cliques grows exponentially. In
Adleman’s experiment, each city and every connecting path had to
be “hardcoded,” in other words, specially made to order in
a biology lab. In a problem with more cities, it would be virtually impossible
to manually create every possible path. Ouyang and company designed an
algorithm that would only require the manual creation a linearly growing
number of DNA strands, 2N to be exact, where ‘N’ is the number
of vertices in the graph.
The six-node graph for this problem
The maximum clique size is 4, and the maximum clique contains the nodes
2,3,4,5.
Their algorithm went like this. Each possible clique was represented
by a binary number of N bits where each bit in the number represented
a particular vertex. If a certain bit held a ‘1’, the corresponding
vertex was in the clique, if it was a ‘0’, it wasn’t
part of the clique. For example, a clique containing the first four nodes
would be “001111”.
Then, they removed
those cliques that contained illegal connections, because not every node
was interconnected in the original graph (otherwise the entire graph would
be one big clique). The figure on the right shows the illegal paths. For
example, any strand with a "1" in the "0" and "2"
spot could not exist (aka. "xxx1x1" is an illegal configuration).
The resulting data pool held all possible cliques of the graph and the string
with the largest number of ones would be the maximum clique.
To create the complete data pool, the scientists utilized DNA pairing
(each nucleic acid has a complement, so if two strands contain a series
of complementary DNA, they will stick to eachother). The scientists arranged
the cliques so each vertex is separated by a string of “connection”
DNA which we’ll call P. Thus, a clique for the six node graph would
be represented as P0V0 P1V1 P2V2 P3V3 P4V4P5V5 P6, where V is each vertex.
The trick is that each Pi of adjacent vertices are complementary strands
of DNA, so scientists need only create strands of DNA for individual nodes
(aka. PoV0 P1, P1V1 P2 etc.) and they will naturally bond together. Thus
the researchers only needed to create two sequences of DNA for each node,
one to represent the node if it was in the clique (a ‘1’)
and the other to represent if it wasn't in that clique (a ‘0’),
and the complementary ‘P’ strands would bind them together
to form 6 vertex long strands representing all possible cliques. In case
you were curious, the naming standard used was no nucleic acids if the
vertex was a "1" and a predefined sequence of 10 acids if the
vertex was a '0'. Thus it would easy to determine the size of the maximum
clique simply by looking at the lengths of the strands. A method called
“thermal cycling” was used to recursively construct all possible
strands, and the complete/correct strands (those that began with V0 and
ended with V5) were amplified.
The next step, removing cliques that cannot exist for the graph is a
bit more difficult and requires a large amount of manual labor. Although
I will not go into the biology, the process involves cutting strands at
specific places using enzymes, dividing the data into separate test tubes,
and then using sequential restriction operations to eliminate the strands
containing the illegal connection; a process which must be repeated for
every non-connection in a graph. Thus, even with a faster construction
method, this experiment is not scalable. Finding the final answer is easy;
gel electrophoresis will reveal to us the shortest DNA strand, which corresponds
to the largest clique.
Diagrams taken from Ouyang, et al. “DNA Solution
of the Maximal Clique Problem.” Science 1997 278: 446-449.
back to experiments
|