Papers for CS167 (Readings in Algorithms)
Please send a ranked list of 3-4 papers to the course staff by Friday, April 8.
The papers are loosely grouped into the following topics:
Classic problems
-
Aggregating Inconsistent Information: Ranking and Clustering
- Summary: Given k voters who each submit a ranked list of n candidates, we want to create a global ranking that is as consistent as possible with the k lists. This is NP-hard even when k=4, but we present a simple algorithm that gives an 11/7-approximation under the relevant metric. The same techniques apply to ordering teams at the end of a round-robin tournament, and several other related problems.
-
Sparse Approximation via Generating Point Sets
- Summary: We find a subset T of a set of points P that ε-approximates the convex hull of P. Furthermore, each point in P can be approximated by a convex combination of a small number of points in T. Of course, setting T to P would solve the problem; we find a T of size comparable to the smallest possible such T.
-
Simple, Fast and Deterministic Gossip and Rumor Spreading
- Summary: In rumor spreading, each node needs to communicate a message to every other node in an unknown network. Past algorithms have been inherently randomized; we give a deterministic algorithm that is simpler, more robust, and faster than any of the randomized attempts.
-
Multi-probe Consistent Hashing
- Summary: Consistent hashing meets cuckoo hashing: we propose and test an algorithm for hashing keys to machines that is robust to machines arriving and disappearing, load balances nicely, and requires relatively little replication.
-
k-means++: The Advantages of Careful Seeding
- Summary: k-means is a popular clustering algorithm. It consists of an initialization, where we choose k random cluster centers, followed by a deterministic local search procedure. We propose a simple modification to the initialization step that improves both its theoretical guarantees and its experimental outcomes.
-
Select with Groups of 3 or 4 Takes Linear Time
- Summary: The traditional median of medians algorithm uses groups of 5 for its first pass, and it has been widely believed that one could not run a similar algorithm (in linear time) with groups of less than 5. We show a median of medians algorithm that works with groups of 3 or 4, and that uses fewer comparisons than the groups of 5 algorithm.
Really classic problems
-
Perfect Matchings in O(n log n) Time in Regular Bipartite Graphs
- Summary: Title says it all. Uses randomization to beat the lower bound for deterministic algorithms.
-
A Permanent Approach to the Traveling Salesman Problem
- Summary: We continue to chip away at general TSP by showing that approximation is easy on (regular) graph-TSP instances with high degree.
-
A Back-to-Basics Empirical Study of Priority Queues
- Summary: Advice on how to design one's practical heap algorithm. We show that wallclock time is highly correlated with the number of L1 cache misses, and that high-level design decisions can have a significant impact on cache behavior.
-
Online Steiner Tree with Deletions
- Summary: We use a primal-dual framework and a global charging argument to maintain a constant-competitive Steiner tree as nodes are removed from a set. We also give an algorithm for the fully dynamic model, where nodes are both added and removed.
-
Linear Probing with Constant Independence
- Summary: Linear probing using a pairwise independent hash family can have logarithmic cost per operation (this is over worst-case data). However, we show that 5-wise independence is enough to ensure O(1) cost per operation.
-
Adaptive Search over Sorted Sets
- Summary: Binary search over a sorted list of arbitrary data takes O(log n) time, but if our data is uniform we can do better. Unfortunately, algorithms such as interpolation search which take advantage of the data being uniform can take Θ(n) time if the data ends up not actually being uniform. We give a search algorithm that is at most an additive constant slower than interpolation search, but which still has worst case O(log n) running time.
Machine learning
New models
-
Team Performance with Test Scores
- Summary: A group with diversity can often outperform a group of high-achieving but like-minded individuals. We model the problem of selecting and measuring the performance of a potential team, when one is only able to administer individual tests.
-
Time-Inconsistent Planning: A Computational Problem in Behavioral Economics
- Summary: People often behave inconsistently across time, such as by procrastinating, abandoning half-completed projects, or working more efficiently on projects when they have a deadline. We propose a model of tasks, goals, and dependencies between tasks which unifies these behaviors, and which suggests ways in which tasks can be designed to improve the chance that a goal is reached.
-
Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography
- Summary: This paper describes active and passive attacks to de-anonymize an anonymously presented social network. In the active attack the attacker needs to make only O(sqrt(log n)) fake accounts to compromise the privacy of any targeted node, and in the passive attack a small coalition of friends figures out their anonymized ids, from which they can deanonymize other friends not in the coalition.
-
Computational Complexity and Information Asymmetry in Financial Products
- Summary: A commonly cited benefit of financial derivatives is that they protect buyers from dishonest sellers, and hence lower the cost dishonest sellers impose on the market. We show that a commonly used derivative can be tampered with such that (under cryptographic assumptions) a buyer cannot distinguish between the tampered and untampered versions.