Sidestepping Hardness in Statistical Problems

Abstract

Many fundamental problems in statistics and machine learning are computationally hard in the worst case. However, this worst-case hardness does not necessarily preclude efficient algorithms for these problems in practice. This thesis explores several techniques for “sidestepping” computational hardness barriers in statistical problems, developing efficient algorithms that work under natural assumptions or for practically relevant problem instances.

The thesis consists of several parts examining different approaches to circumventing hardness:

  1. Sample Amplification: We study when it’s possible to generate additional synthetic samples that are statistically indistinguishable from real data, even when learning the underlying distribution is impossible.

  2. Log-Concave Maximum Likelihood: We develop polynomial-time algorithms for high-dimensional log-concave maximum likelihood estimation by leveraging locally exponential families.

  3. Submodular Function Minimization: We provide near-optimal approximate algorithms for discrete and continuous submodular function minimization with improved oracle complexity.

  4. Strategic Classification: We examine learning from strategic agents who can manipulate their features, developing causal frameworks for robust prediction.

Throughout, we demonstrate that computational hardness can often be sidestepped through careful problem formulation, natural distributional assumptions, or novel algorithmic techniques. The results provide both theoretical insights and practical algorithms for important statistical problems.