Untitled Document

Sequential Randomized Algorithm - The Sock Selection Problem (Input Randomization)

Problem description:
Jane is a Stanford freshman student, and every morning she struggles hard to get to her 10am CS lecture on time. The biggest problem she has is that she has to choose a pair of socks before she can rush to class. There are 2n socks in her drawer, half white and half black. Either color will do. She can only extract one sock at a time from the drawer and may also throw socks away (one at a time) if she believes that she has no use for them. No socks can be put back to the drawer once they are removed. The question that concerns her is, then: How many socks must she remove from the drawer before she gets a matching pair?

Analysis of the problem:
The problem will turn out to be very easy if Jane can hold at least three socks in her hand at a time. She only has to remove at most three socks from the drawer, and then she will get a pair of socks by throwing away the unwanted one. However, the real problem we study here has another requirement, which is that Jane can hold no more than two socks at a time. Now, the sock selection problem becomes much interesting and challenging.

However, if we consider the worse case, we have to call GetSock() n+2 times, which indicates the running time of O(n). The worse case is that Jane has to remove every single sock from the drawer before she gets a matching pair. Here the drawer plays as an adversary and it gives out the socks in either of the two following orders:

Therefore, we can see that this deterministic algorithm has a worst-case running time of O(n). However in real life, it is less likely that the drawer will give out socks in the above two orders, thus the average running time of the algorithm will be a lot less than O(n). Moreover, we can even make a stronger statement: Any deterministic algorithm will have O(n) worst-case running time.

One important assumption about this algorithm: The drawer has no access to the result of coin toss, i.e., the randomization is private to Jane herself.

This assumption is important because if the coin toss is open to everyone, the drawer can act as an adversary. It can arrange the order of socks so that the algorithm would take running time of O(n). Even worse, the drawer can force Jane to have no matching pair at all after all socks have been removed from the drawer.

Assume the toss is private, let's have a look at the efficiency of the randomized algorithm. One good strategy the drawer might use is as follows:

In this case, we can calculate as follows: Assume the while loop will be executed i times, i>=1, so GetSock() will be called (i+2) times. After some computaion in statistics, we see that the probability that the while-loop will be executed i times is (1/2)^i. Thus, the probability that GetSock() is called (i+2) times = the probability that the while-loop will be executed i times = (1/2)^i.

~= 4

More discussion on the randomized algorithm:
However, the randomized algorithm of sock selection may sometimes produce a solution that is incorrect as the following example shows: Even if the coin toss is not public, there is still possibility that Jane will end up with two socks of different color after the drawer is empty. Yet the probability of failure is exponentially small in n. Clearly this algorithm is a Monte Carlo algorithm, which does not always give the right answer. Also, we can turn it into a Las Vegas algorithm by some modifications. We can add two variables: one keeps track of the number of white socks left in the drawer, and the other keeps track of the number of black socks left in the drawer. Whenever the algorithm detects that Jane has the last sock of a particular color, then it should immediately discard that sock. Then next call to GetSock() will return a matching sock.

Comparison between the two algorithms:
The running time of the deterministic solution averaged over all sequences returned by GetSock() is 4, which is interestingly the same as the expected running time of the randomized solution for any input sequence. From that, we can get a glimpse of the efficiency for applying randomization in certain algorithms.

Back to Examples Page