Class notes from 5/21

FAST FOURIER TRANSFORM (FFT)

The original application for the FFT was in signal processing, though
we won't be talking about that today.

You have seen simple divide and conquer algorithms in CS 161, eg. for
mergesort. FFT is a highly non-simple application of divide and
conquer.

Other applications of FFT are:
String matching with wildcards
Matrix-vector multiplication
Multiplying integers in almost linear time
  eg. crypto sometimes has integers large enough that they use FFT to multiply

The application we will be talking about today is k-SUM. In
particular, given n integers a1,a2,..,an and a fixed number t, do any
k of the a_i sum to t? We will be talking about the decision version
of the problem, ie we output a yes or no answer rather than finding
the k integers in the case of a yes.
--

Warm-up: 2-SUM
Input: n positive integers a1 .. an, integer t
Output: Yes iff there exists ai, aj such that ai+aj = t

This is a common interview question.
Answer: Put all the ai in a hash table, then lookup t-ai for every i.
Runs in randomized O(n) time.

Say you can only add numbers and do comparisons of the form "is x greater than y".
Answer: Sort the ai's into a list A, and then binary search A for t-ai for each i.
Runs in randomized O(n log n) time.
--

Okay, now for 3-SUM:
Input: n positive integers a1 .. an, integer t
Output: Yes iff there exists ai, aj, ak such that ai+aj+ak = t

Can get randomized O(n^2) or O(n^2 log n) time using the techniques
above. Eg. store each ai in a hash table, and look up t-ai-aj for
every i,j.

Open: doing 3-SUM in n^(2-epsilon) for any epsilon. Eg. n^1.99 would count.
In fact, it is somewhat of a notorious open problem; there are
reductions to 3-SUM (within P) in the way that there are reductions to
NP-hard problems. Ie. someone might say "getting a faster algorithm
for problem X is hard because it would imply a n^1.99 algorithm for 3-SUM".

A cool thing to note is that progress is being made, even very
recently. There was an O(n^2/log n^2) algorithm in 2008 (recall this
is still worse than n^(2-eps) for any eps).
Last month, someone published an algorithm with O(n^1.5) decision tree
complexity. In this setting the algorithm is allowed to think as much
as it wants, but eg. would only be allowed to make O(n^1.5)
comparisons. So not the same thing as an O(n^1.5) time algorithm, but
it is suggestive.  -----

Today we will consider a special case of k-SUM:
0 < a_i < M for each i.
We will think of M = O(n).
Not at all clear why this restriction should make a difference at all!
We will show that in this setting we can solve k-SUM in O(n log n) time via FFT.

Warm-up: 2-SUM again
We are going to use a different algorithm, one that scales a little
better to 3-SUM (and k-SUM). We are going to reduce the problem to
multiplying polynomials, and assume for now we have a black box that
does fast polynomial multiplication.

Step 1: Form the polynomial
P(x) = sum_i x^a_i
We will store this as an array of coefficients, with a 1 at each a_i,
and a 0 everywhere else. This takes at most O(M) time.

Step 2:
Compute P(x)^2 in O(M lg M) time via the black box.

Step 3: return yes iff the coefficient of x^t is > 0.
--

3-SUM: Do the same thing, with P(x)^3 instead.

Note that this isn't quite right, since this would allow repeats,
eg. ai+ai+aj. You can do something called color coding to get the
distinct case. Randomly color the a_i's {red, green, blue}. Let P_r be
the poly formed by only the red a_i's. Then P_r*P_g*P_b will have a
non-zero coefficient in x^t with probability at least 3!/27 in the yes
case, and with probability 0 in the no case. You can then boost the
probability by repeating the process 50 times.
--

We'll now talk a bit about Polynomial Multiplication.
Poly Mult:
Input:
 P(x) = sum ai x^i
 Q(x) = sum bi x^i
wlog both are the same length, which is a power of 2 (just pad the
beginning with 0s).

Output:
P(x)*Q(x) = sum ci x^i, where ci = sum_{k=0^i} a_k*b_{i-k}

Solution 1: Naive solution is O(n^2).

Can we hope to do better?
Solution 2: We can do at least somewhat better via divide and
conquer. This is super slick.
Write
P(x) = x^{n/2} A(x) + B(x)
Q(x) = x^{n/2} C(x) + D(x)
where A,B,C,D are n/2-degree sized polynomials.
Eg. if P(x) = x^3 + 10x^2 + 5x + 7, A(x) = x + 10 and B(x) = 5x + 7.

Then
PQ = x^n*AC + x^{n/2}*(AD+BC) + BD

So this easily gives
T(n) <= 4*T(n/2) + O(n)
So T(n) = O(n^2) by the Master theorem.

We can do a bit better. Instead of doing 4 recursive calls, compute only
AC, BD, and (A+B)(C+D),
since the middle term  AD+BC = (A+B)(C+D)-AC-BD

So now
T(n) <= 3T(n/2) + O(n) and
T(n) = O(n^log_2 3) = O(n^1.59)
By the Master theorem.
Pretty good!

Solution 3:
Split via evens and odds instead of first half / second half.
P(x) = x*A(x^2) + B(x^2)
eg. if P = x^3 + 10x^2 + 5x + 7 then A(x^2) = x^2 + 5, B(x^2) = 10x^2 + 7

We again get a similar thing, with a O(n^1.59) algorithm.
--

None of this is FFT, but note that we've definitively shown that
O(n^2) is not the right answer. And no one is seriously going to think
that n^1.59 is actually the right answer.

So basically we want to go from
T(n) <= 3T(n/2) + O(n) provided by solutions 2 and 3 above, to
T(n) <= 2T(n/2) + O(n),
which gives the familiar T = O(n log n) via the master theorem.
--

Representing a poly: Note that a non-zero n-degree poly has at most n
roots. Hence any degree n poly is determined by its value P(x_i) on
any n+1 distinct points x0,x1,..,xn, and can be represented as P(x_i)
for i = 0..n rather than as a list of n+1 coefficients.

Why have different representations of the same thing? In general,
because some operations might be easy in one, even if they are hard in
the other.

In particular, multiplying two polynomials is linear time in the new
representation! Just pick 2n points, and set (P*Q)(x_i) = P(x_i)*Q(x_i).

Plan of attack for polynomial multiplication:
1. translate P,Q to points representation
2. multiply there in O(n) time.
3. translate back

Recall that a fast algorithm here will give a fast algorithm for k-SUM.
We will only talk about step 1 today.

Naive solution: O(n^2). There are O(n) different points, and
evaluating each one take O(n) time.

BUT: we can choose the xi's cleverly.

Eg. say we wanted to compute P(1) and P(-1).
Recall our formulation of P(x) = x A(x^2) + B(x^2)
Then computing P(1) takes however long it takes. But computing P(-1)
is free, since A an B are the same for P(1) and P(-1).

So if we split x0..xn into n/2 plus/minus pairs, then on the outer
level of the algorithm at least, we would get our desired T(n) = 2*T(n/2) + O(n).

How to get past the top level?
Solution: The reason the plus/minus thing works is that we have found
numbers with 2 square roots r1 and r2, and then we use P(r1) and
P(r2). So now all we need is r1 and r2 to have two square roots. Well,
they do! In the complex plane.

Everything we said above still holds in the complex plane. Eg
polynomials still have at most d roots, by the fundamental theorem of
algebra.

Taking the square rooting to its logical conclusion (and recalling
that n is a power of 2), we set
x_i = w^i, where w is the complex n-th root of unity.

Claim: We can evaluate P(w^0) .. P(w^{n-1}) in O(n log n) time.
Proof: write
P(x) = xA(x^2) + B(x^2)

Recursively evaluate
A(w^0), A(w^2) .. A(w^{n-2}) [one recursive call]
B(w^0), B(w^2) .. B(w^{n-2}) [one recursive call]

And then for each x_i = w^i,
compute P(w^i) = w^i A(w^2i) + B(w^2i) [O(n) work]

So T(n) = 2T(n/2) + O(n), as desired.
--

Step 2 is easy.
Next time: Step 3!