Class notes from 4/16/14 ANNOUNCEMENTS (schedule) Week 4 paper tips presentation tips Charikar02 (paper response by Monday), Sec 1 Week 5-8 you Week 9-10 any topics by request tie things together, how things fit into modern algorithmic research == PAPER == Faster Dimension Reduction Ailon/Chazelle, CACM '10 PROBLEM STUDIED (Faster) dimension reduction Input: data in high dimension. -- d = number of dimensions -- n = number of points Output: data in lower dimensions, ie. in R^k So have a function from f: R^n -> R^k. Requirements: Distance preservation, ie. |f(x)-f(y)| \approx |x-y| Under what norm? Let's say L_2 for now, but it doesn't matter much. Could think of this literally, as in points in space R^n Or more powerfully, can embed other problems into high dimesional space, and then use geometric intuition to solve it -- eg. documents -- eg. images Applications: given a query, find a similar document (nearest neighbor) clustering duplicate suppression Paper only considers a particular class of functions, namely those that are linear. so: f is linear, f(x) = Ax, where A is a k-by-d matrix (short and wide). why linear? fast distance preservations reduces to length preservation. ie. f(x)-f(y) = f(x-y) for all x and y. So the requirement |f(x)-f(y)| \approx |x-y| reduces to |Ax| \approx |x| for all x. So: Say k < d, want |Ax| \approx |x| for all x. But can't do this! Any two points in the null space of A will map to 0, no matter how far away they are. So new goal. Flip the quantifiers. Identify a family of M of matrices, such that for all n point subsets of R^d, all distances are (approximately) preserved with high probability. Recall the story with hashing, it is the same. Ie. for any fixed hash function the adversary can trivially mess you up. So we commit to a distribution of hash functions, and then let the adversary choose the points. Previous answer (Johnson-lindenstrauss): each entry of A is iid from N(0,1). (ie. standard gaussian). We take k = O(lg n / eps^2), where we want |x|(1-eps) < |Ax| < |x|(1+eps). Why does this work? Fix z \in R^d. Pick a \in R^d with a_i from N(0,1). Then = \sum a_i z_i = sum of gaussians = a gaussian with variance sum z_i^2 = gaussian with variance |z|. This is an unbiased estimator, k is just repeated trials to reduce the variance. The lg(n) is because there are n^2 different distances you have to union bound over to get the high probability. All this was well-known before this paper. So what's new? Getting down the running time. Old solution (ie. J-L): O(kd) per point, or O(kdn) total. What can we do to make it faster? Fast matrix multiplication? (good idea, but we'll actually do even better) Make A not have full rank? (non-trivial lower bounds show A must have full rank) Make A sparse? (good idea, but eg. even a column of zeros in A would map a non-zero vector (the corresponding basis element) to 0). Answer: A = (highly sparse matrix)*(highly structured dense matrix)*(random +/- 1 diagonal matrix). The structured dense matrix uses a divide and conquer idea, similar to the fast fourier transform.