Class notes from 4/16/14
ANNOUNCEMENTS (schedule)
Week 4
paper tips
presentation tips
Charikar02 (paper response by Monday), Sec 1
Week 5-8
you
Week 9-10
any topics by request
tie things together, how things fit into modern algorithmic research
== PAPER ==
Faster Dimension Reduction
Ailon/Chazelle, CACM '10
PROBLEM STUDIED
(Faster) dimension reduction
Input: data in high dimension.
-- d = number of dimensions
-- n = number of points
Output: data in lower dimensions, ie. in R^k
So have a function from f: R^n -> R^k.
Requirements: Distance preservation, ie.
|f(x)-f(y)| \approx |x-y|
Under what norm? Let's say L_2 for now, but it doesn't matter much.
Could think of this literally, as in points in space R^n
Or more powerfully, can embed other problems into high dimesional
space, and then use geometric intuition to solve it
-- eg. documents
-- eg. images
Applications:
given a query, find a similar document (nearest neighbor)
clustering
duplicate suppression
Paper only considers a particular class of functions, namely those that are linear.
so: f is linear, f(x) = Ax, where A is a k-by-d matrix (short and wide).
why linear?
fast
distance preservations reduces to length preservation.
ie. f(x)-f(y) = f(x-y) for all x and y. So the requirement
|f(x)-f(y)| \approx |x-y| reduces to
|Ax| \approx |x| for all x.
So: Say k < d, want |Ax| \approx |x| for all x.
But can't do this! Any two points in the null space of A will map to 0, no matter how far away they are.
So new goal. Flip the quantifiers. Identify a family of M of matrices,
such that for all n point subsets of R^d, all distances are
(approximately) preserved with high probability.
Recall the story with hashing, it is the same. Ie. for any fixed hash
function the adversary can trivially mess you up. So we commit to a
distribution of hash functions, and then let the adversary choose the
points.
Previous answer (Johnson-lindenstrauss): each entry of A is iid from N(0,1). (ie. standard gaussian).
We take k = O(lg n / eps^2), where we want |x|(1-eps) < |Ax| < |x|(1+eps).
Why does this work?
Fix z \in R^d. Pick a \in R^d with a_i from N(0,1).
Then = \sum a_i z_i
= sum of gaussians
= a gaussian with variance sum z_i^2
= gaussian with variance |z|.
This is an unbiased estimator, k is just repeated trials to reduce the
variance. The lg(n) is because there are n^2 different distances you
have to union bound over to get the high probability.
All this was well-known before this paper. So what's new? Getting down the running time.
Old solution (ie. J-L): O(kd) per point, or O(kdn) total.
What can we do to make it faster?
Fast matrix multiplication? (good idea, but we'll actually do even better)
Make A not have full rank? (non-trivial lower bounds show A must have full rank)
Make A sparse? (good idea, but eg. even a column of zeros in A would map a non-zero vector (the corresponding basis element) to 0).
Answer: A = (highly sparse matrix)*(highly structured dense matrix)*(random +/- 1 diagonal matrix).
The structured dense matrix uses a divide and conquer idea, similar to the fast fourier transform.