streaming

Streaming Submodular Maximization under Matroid Constraints

ICALP 2022

Moran Feldman, Paul Liu, Ashkan Norouzi-Fard, Ola Svensson, Jan Vondrák, and Rico Zenklusen

Recent progress in (semi-)streaming algorithms for monotone submodular function maximization has led to tight results for a simple cardinality constraint. However, current techniques fail to give a similar understanding for natural generalizations, including matroid constraints. Our paper improves on the previously best approximation guarantees of $1/4$ and $1/2$ for single-pass and multi-pass streaming algorithms, respectively. In fact, our multi-pass streaming algorithm is tight in that any algorithm with a better guarantee than $1/2$ must make several passes through the stream and any algorithm that beats our guarantee of $1-1/e$ must make linearly many passes (as well as an exponential number of value oracle queries).
[pdf]
Cardinality constrained submodular maximization for random streams

NeurIPS 2021

Paul Liu, Aviad Rubinstein, Jan Vondrák, Junyao Zhao

We consider the problem of maximizing submodular functions in single-pass streaming and secretaries-with-shortlists models, both with random arrival order. For cardinality constrained monotone functions, Agrawal, Shadravan, and Stein gave a single-pass $(1-1/e-\varepsilon)$-approximation algorithm using only linear memory, but their exponential dependence on $\varepsilon$ makes it impractical even for $\varepsilon=0.1$. We simplify both the algorithm and the analysis, obtaining an exponential improvement in the $\varepsilon$-dependence (in particular, $O(k/\varepsilon)$ memory). Extending these techniques, we also give a simple $(1/e-\varepsilon)$-approximation for non-monotone functions in $O(k/\varepsilon)$ memory. For the monotone case, we also give a corresponding unconditional hardness barrier of $1-1/e+\varepsilon$ for single-pass algorithms in randomly ordered streams, even assuming unlimited computation.
Finally, we show that the algorithms are simple to implement and work well on real world datasets.
[pdf] [code] [poster] [slides]
Diversity on the Go! Streaming Determinantal Point Processes under a Maximum Induced Cardinality Objective

WWW 2021

Paul Liu, Akshay Soni, Eun Yong Kang, Yajun Wang, Mehul Parsana

Over the past decade, Determinantal Point Processes (DPPs) have proven to be a mathematically elegant framework for modeling diversity. Given a set of items $N$, DPPs define a probability distribution over subsets of $N$, with sets of larger diversity having greater probability. Recently, DPPs have achieved success in the domain of recommendation systems, as a method to enforce diversity of recommendations in addition to relevance.
In large-scale recommendation applications however, the input typically comes in the form of a stream too large to fit into main memory. However, the natural greedy algorithm for DPP-based recommendations is memory intensive, and cannot be used in a streaming setting.
In this work, we give the first streaming algorithm for optimizing DPPs under the Maximum Induced Cardinality (MIC) objective of Gillenwater et al. As noted by Gillenwater et al., the MIC objective is better suited towards recommendation systems than the classically used maximum a posteriori (MAP) DPP objective. In the insertion-only streaming model, our algorithm runs in $\tilde{O}(k^2)$ time per update and uses $\tilde{O}(k)$ memory, where $k$ is the number of diverse items to be selected. In the sliding window streaming model, our algorithm runs in $\tilde{O}(\sqrt{n}k^2)$ time per update and $\tilde{O}(\sqrt{n}k)$ memory where $n$ is the size of the sliding window. The approximation guarantees are simple, and depend on the largest and the $k$-th largest eigenvalues of the kernel matrix used to model diversity. We show that in practice, the algorithm often achieves close to optimal results, and meets the memory and latency requirements of production systems. Furthermore, the algorithm works well even in a non-streaming setting, and runs in a fraction of time compared to the classic greedy algorithm.
[pdf] [code]
Improved Algorithms for Edge Colouring in the W-Streaming Model

SOSA@SODA 2021

Paul Liu and Moses Charikar

In the W-streaming model, an algorithm is given $O(n \text{polylog} n)$ space and must process a large graph of up to $O(n^2)$ edges. In this short note we give two algorithms for edge colouring under the W-streaming model. For edge colouring in W-streaming, a colour for every edge must be determined by the time all the edges are streamed. Our first algorithm uses $\Delta+o(\Delta)$ colours in $O(n \log^2 n)$ space when the edges arrive according to a uniformly random permutation. The second algorithm uses $(1+o(1))\Delta^2/s$ colours in $\tilde{O}(ns)$ space when edges arrival adversarially.
[pdf] [slides] [video]
Approximation Schemes for Covering and Packing in the Streaming Model

CCCG 2018

Christopher Liaw, Paul Liu, Robert Reiss

Within the computational geometry (CG) community, the shifting strategy is a popular algorithmic paradigm for approximation algorithms. In this work we revisit the shifting strategy in the streaming model. Using the streaming-shifting strategy, we give low-memory approximation algorithms for classical CG problems such as unit disc cover. When combined with the shifting coreset technique of Fonseca et al., we are able to approximate streaming variants of geometric dominating set, independent set, etc.
[pdf]