Papers for CS167 (Readings in Algorithms)

Please send a ranked list of 3-4 papers to the course staff by Friday, April 8. The papers are loosely grouped into the following topics:

Classic problems
Really classic problems
Machine learning
New models

Classic problems

Aggregating Inconsistent Information: Ranking and Clustering
Ailon/Charikar/Newman, JACM '08
- Summary: Given k voters who each submit a ranked list of n candidates, we want to create a global ranking that is as consistent as possible with the k lists. This is NP-hard even when k=4, but we present a simple algorithm that gives an 11/7-approximation under the relevant metric. The same techniques apply to ordering teams at the end of a round-robin tournament, and several other related problems.
Sparse Approximation via Generating Point Sets
Blum/Har-Peled/Raichel, Preprint '15
- Summary: We find a subset T of a set of points P that ε-approximates the convex hull of P. Furthermore, each point in P can be approximated by a convex combination of a small number of points in T. Of course, setting T to P would solve the problem; we find a T of size comparable to the smallest possible such T.
Simple, Fast and Deterministic Gossip and Rumor Spreading
Haeupler, JACM '15
- Summary: In rumor spreading, each node needs to communicate a message to every other node in an unknown network. Past algorithms have been inherently randomized; we give a deterministic algorithm that is simpler, more robust, and faster than any of the randomized attempts.
Multi-probe Consistent Hashing
Appleton/O'Reilly, Preprint '15
- Summary: Consistent hashing meets cuckoo hashing: we propose and test an algorithm for hashing keys to machines that is robust to machines arriving and disappearing, load balances nicely, and requires relatively little replication.
k-means++: The Advantages of Careful Seeding
Arthur/Vassilvitskii, SODA '07
- Summary: k-means is a popular clustering algorithm. It consists of an initialization, where we choose k random cluster centers, followed by a deterministic local search procedure. We propose a simple modification to the initialization step that improves both its theoretical guarantees and its experimental outcomes.
Select with Groups of 3 or 4 Takes Linear Time
Chen/Dumitrescu, Preprint '14
- Summary: The traditional median of medians algorithm uses groups of 5 for its first pass, and it has been widely believed that one could not run a similar algorithm (in linear time) with groups of less than 5. We show a median of medians algorithm that works with groups of 3 or 4, and that uses fewer comparisons than the groups of 5 algorithm.

Really classic problems

Perfect Matchings in O(n log n) Time in Regular Bipartite Graphs
Goel/Kapralov/Khanna, STOC '10
- Summary: Title says it all. Uses randomization to beat the lower bound for deterministic algorithms.
A Permanent Approach to the Traveling Salesman Problem
Vishnoi, FOCS '12
- Summary: We continue to chip away at general TSP by showing that approximation is easy on (regular) graph-TSP instances with high degree.
A Back-to-Basics Empirical Study of Priority Queues
Larkin/Sen/Tarjan, ALENEX '14
- Summary: Advice on how to design one's practical heap algorithm. We show that wallclock time is highly correlated with the number of L1 cache misses, and that high-level design decisions can have a significant impact on cache behavior.
Online Steiner Tree with Deletions
Gupta/Kumar, SODA '14
- Summary: We use a primal-dual framework and a global charging argument to maintain a constant-competitive Steiner tree as nodes are removed from a set. We also give an algorithm for the fully dynamic model, where nodes are both added and removed.
Linear Probing with Constant Independence
Pagh/Pagh/Ružić, JCo '09
- Summary: Linear probing using a pairwise independent hash family can have logarithmic cost per operation (this is over worst-case data). However, we show that 5-wise independence is enough to ensure O(1) cost per operation.
Adaptive Search over Sorted Sets
Bonasera/Ferrara/Fiumara/Pagano/Provetti, JDA '15
- Summary: Binary search over a sorted list of arbitrary data takes O(log n) time, but if our data is uniform we can do better. Unfortunately, algorithms such as interpolation search which take advantage of the data being uniform can take Θ(n) time if the data ends up not actually being uniform. We give a search algorithm that is at most an additive constant slower than interpolation search, but which still has worst case O(log n) running time.

Machine learning

Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection
Das/Kempe, ICML '11
- Summary: Subset selection is the following problem: given n observed random variables and variable z to be predicted, select a subset of k variables whose linear combination best approximates z. This paper introduces the notion of submodularity ratio to explain why greedy algorithms perform well on this task, and gives an algorithm with a strong approximation guarantee.
Randomized Composable Core-sets for Distributed Submodular Maximization
Mirrokni/Zadimoghaddam, STOC '15
- Summary: We describe an algorithm for approximate submodular maximization for when the data is too large to fit on one machine. We use an existing technique whose practical effectiveness has already been demonstrated in several machine learning applications.

New models

Team Performance with Test Scores
Kleinberg/Raghu, EC '15
- Summary: A group with diversity can often outperform a group of high-achieving but like-minded individuals. We model the problem of selecting and measuring the performance of a potential team, when one is only able to administer individual tests.
Time-Inconsistent Planning: A Computational Problem in Behavioral Economics
Kleinberg/Oren, EC '14
- Summary: People often behave inconsistently across time, such as by procrastinating, abandoning half-completed projects, or working more efficiently on projects when they have a deadline. We propose a model of tasks, goals, and dependencies between tasks which unifies these behaviors, and which suggests ways in which tasks can be designed to improve the chance that a goal is reached.
Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography
Backstrom/Dwork/Kleinberg, WWW '07
- Summary: This paper describes active and passive attacks to de-anonymize an anonymously presented social network. In the active attack the attacker needs to make only O(sqrt(log n)) fake accounts to compromise the privacy of any targeted node, and in the passive attack a small coalition of friends figures out their anonymized ids, from which they can deanonymize other friends not in the coalition.
Computational Complexity and Information Asymmetry in Financial Products
Arora/Barak/Brunnermeier/Ge, ICS '10
- Summary: A commonly cited benefit of financial derivatives is that they protect buyers from dishonest sellers, and hence lower the cost dishonest sellers impose on the market. We show that a commonly used derivative can be tampered with such that (under cryptographic assumptions) a buyer cannot distinguish between the tampered and untampered versions.