Class notes from 5/21 FAST FOURIER TRANSFORM (FFT) The original application for the FFT was in signal processing, though we won't be talking about that today. You have seen simple divide and conquer algorithms in CS 161, eg. for mergesort. FFT is a highly non-simple application of divide and conquer. Other applications of FFT are: String matching with wildcards Matrix-vector multiplication Multiplying integers in almost linear time eg. crypto sometimes has integers large enough that they use FFT to multiply The application we will be talking about today is k-SUM. In particular, given n integers a1,a2,..,an and a fixed number t, do any k of the a_i sum to t? We will be talking about the decision version of the problem, ie we output a yes or no answer rather than finding the k integers in the case of a yes. -- Warm-up: 2-SUM Input: n positive integers a1 .. an, integer t Output: Yes iff there exists ai, aj such that ai+aj = t This is a common interview question. Answer: Put all the ai in a hash table, then lookup t-ai for every i. Runs in randomized O(n) time. Say you can only add numbers and do comparisons of the form "is x greater than y". Answer: Sort the ai's into a list A, and then binary search A for t-ai for each i. Runs in randomized O(n log n) time. -- Okay, now for 3-SUM: Input: n positive integers a1 .. an, integer t Output: Yes iff there exists ai, aj, ak such that ai+aj+ak = t Can get randomized O(n^2) or O(n^2 log n) time using the techniques above. Eg. store each ai in a hash table, and look up t-ai-aj for every i,j. Open: doing 3-SUM in n^(2-epsilon) for any epsilon. Eg. n^1.99 would count. In fact, it is somewhat of a notorious open problem; there are reductions to 3-SUM (within P) in the way that there are reductions to NP-hard problems. Ie. someone might say "getting a faster algorithm for problem X is hard because it would imply a n^1.99 algorithm for 3-SUM". A cool thing to note is that progress is being made, even very recently. There was an O(n^2/log n^2) algorithm in 2008 (recall this is still worse than n^(2-eps) for any eps). Last month, someone published an algorithm with O(n^1.5) decision tree complexity. In this setting the algorithm is allowed to think as much as it wants, but eg. would only be allowed to make O(n^1.5) comparisons. So not the same thing as an O(n^1.5) time algorithm, but it is suggestive. ----- Today we will consider a special case of k-SUM: 0 < a_i < M for each i. We will think of M = O(n). Not at all clear why this restriction should make a difference at all! We will show that in this setting we can solve k-SUM in O(n log n) time via FFT. Warm-up: 2-SUM again We are going to use a different algorithm, one that scales a little better to 3-SUM (and k-SUM). We are going to reduce the problem to multiplying polynomials, and assume for now we have a black box that does fast polynomial multiplication. Step 1: Form the polynomial P(x) = sum_i x^a_i We will store this as an array of coefficients, with a 1 at each a_i, and a 0 everywhere else. This takes at most O(M) time. Step 2: Compute P(x)^2 in O(M lg M) time via the black box. Step 3: return yes iff the coefficient of x^t is > 0. -- 3-SUM: Do the same thing, with P(x)^3 instead. Note that this isn't quite right, since this would allow repeats, eg. ai+ai+aj. You can do something called color coding to get the distinct case. Randomly color the a_i's {red, green, blue}. Let P_r be the poly formed by only the red a_i's. Then P_r*P_g*P_b will have a non-zero coefficient in x^t with probability at least 3!/27 in the yes case, and with probability 0 in the no case. You can then boost the probability by repeating the process 50 times. -- We'll now talk a bit about Polynomial Multiplication. Poly Mult: Input: P(x) = sum ai x^i Q(x) = sum bi x^i wlog both are the same length, which is a power of 2 (just pad the beginning with 0s). Output: P(x)*Q(x) = sum ci x^i, where ci = sum_{k=0^i} a_k*b_{i-k} Solution 1: Naive solution is O(n^2). Can we hope to do better? Solution 2: We can do at least somewhat better via divide and conquer. This is super slick. Write P(x) = x^{n/2} A(x) + B(x) Q(x) = x^{n/2} C(x) + D(x) where A,B,C,D are n/2-degree sized polynomials. Eg. if P(x) = x^3 + 10x^2 + 5x + 7, A(x) = x + 10 and B(x) = 5x + 7. Then PQ = x^n*AC + x^{n/2}*(AD+BC) + BD So this easily gives T(n) <= 4*T(n/2) + O(n) So T(n) = O(n^2) by the Master theorem. We can do a bit better. Instead of doing 4 recursive calls, compute only AC, BD, and (A+B)(C+D), since the middle term AD+BC = (A+B)(C+D)-AC-BD So now T(n) <= 3T(n/2) + O(n) and T(n) = O(n^log_2 3) = O(n^1.59) By the Master theorem. Pretty good! Solution 3: Split via evens and odds instead of first half / second half. P(x) = x*A(x^2) + B(x^2) eg. if P = x^3 + 10x^2 + 5x + 7 then A(x^2) = x^2 + 5, B(x^2) = 10x^2 + 7 We again get a similar thing, with a O(n^1.59) algorithm. -- None of this is FFT, but note that we've definitively shown that O(n^2) is not the right answer. And no one is seriously going to think that n^1.59 is actually the right answer. So basically we want to go from T(n) <= 3T(n/2) + O(n) provided by solutions 2 and 3 above, to T(n) <= 2T(n/2) + O(n), which gives the familiar T = O(n log n) via the master theorem. -- Representing a poly: Note that a non-zero n-degree poly has at most n roots. Hence any degree n poly is determined by its value P(x_i) on any n+1 distinct points x0,x1,..,xn, and can be represented as P(x_i) for i = 0..n rather than as a list of n+1 coefficients. Why have different representations of the same thing? In general, because some operations might be easy in one, even if they are hard in the other. In particular, multiplying two polynomials is linear time in the new representation! Just pick 2n points, and set (P*Q)(x_i) = P(x_i)*Q(x_i). Plan of attack for polynomial multiplication: 1. translate P,Q to points representation 2. multiply there in O(n) time. 3. translate back Recall that a fast algorithm here will give a fast algorithm for k-SUM. We will only talk about step 1 today. Naive solution: O(n^2). There are O(n) different points, and evaluating each one take O(n) time. BUT: we can choose the xi's cleverly. Eg. say we wanted to compute P(1) and P(-1). Recall our formulation of P(x) = x A(x^2) + B(x^2) Then computing P(1) takes however long it takes. But computing P(-1) is free, since A an B are the same for P(1) and P(-1). So if we split x0..xn into n/2 plus/minus pairs, then on the outer level of the algorithm at least, we would get our desired T(n) = 2*T(n/2) + O(n). How to get past the top level? Solution: The reason the plus/minus thing works is that we have found numbers with 2 square roots r1 and r2, and then we use P(r1) and P(r2). So now all we need is r1 and r2 to have two square roots. Well, they do! In the complex plane. Everything we said above still holds in the complex plane. Eg polynomials still have at most d roots, by the fundamental theorem of algebra. Taking the square rooting to its logical conclusion (and recalling that n is a power of 2), we set x_i = w^i, where w is the complex n-th root of unity. Claim: We can evaluate P(w^0) .. P(w^{n-1}) in O(n log n) time. Proof: write P(x) = xA(x^2) + B(x^2) Recursively evaluate A(w^0), A(w^2) .. A(w^{n-2}) [one recursive call] B(w^0), B(w^2) .. B(w^{n-2}) [one recursive call] And then for each x_i = w^i, compute P(w^i) = w^i A(w^2i) + B(w^2i) [O(n) work] So T(n) = 2T(n/2) + O(n), as desired. -- Step 2 is easy. Next time: Step 3!