CS369M: Algorithms for Massive Data Set Analysis

Email: mmahoneyWXYZ AT ZYXW.cs.stanford.edu

Office hours: Most weeks, M 3:00pm to 4:30pm in Math building Rm. 383A (third floor). Alternatively, by appointment.

Teaching Assistant: Alex Shkolnik

Email: ads2WXYZ AT ZYXW.stanford.edu

Office hours: Tue & Thu 3:00pm to 5:00pm, Location: Gates B28

Class time and Location:

MW 11:00-12:15, Building Terman, Room 156. (First meeting is Monday, September 21, 2009.)

Course description: Algorithmic and statistical methods for large-scale data analysis: matrix and graph algorithms; strengths and weaknesses of theoretical techniques for practical scientific and internet data analysis; overlap with related problems in statistics, optimization, numerical analysis, and machine learning. Representative topics include: Matrix problems (numerical and statistical perspectives; algorithmic approaches, including Johnson-Lindenstrauss lemma and randomized projection and sampling algorithms; novel matrix factorizations); Graph problems (graph partitioning algorithms, including spectral methods, flow-based methods, and recent geometric methods; local graph algorithms and approximate eigenvector computation); and Applications to machine learning and statistical data analysis (motivating applications; algorithmic basis of the RKHS method; geometric data analysis, regularization, and statistical inference; boosting and its relationship to conjugate gradient methods, duality, convexity, online learning, and approximation algorithms). Basics of implementing these ideas in medium and large-scale applications.

Prerequisites: Basic understanding of Algorithms (e.g., CS161), Linear Algebra (e.g., Math51), and Probability Theory (e.g., CS109), or equivalent.

Course requirements: Most likely, three homeworks (ca. 15-20% each), scribe two lectures (ca. 5%), and a major project (ca. 50%, which includes initial proposal, then intermediate report, then final report).

Syllabus: pdf

Primary references: Although there are many good books on related topics, there is not a single reference covering the topics we will cover. Thus, we will be reading primary sources from the literature. Relevant papers for each lecture are listed in the Lectures section. A more detailed list of relevant references is also provided below.

Homeworks:

Major project: pdf

Lectures:

Scribe Templates pdf tex

Important Note: These scribed notes have been put up in real time to aid dissemination and complement class notes, but they have not been fully edited---Caveat emptor!

Mon 09/21/09: Course Information, Introduction, and Overview

Lecture Notes: pdf

References: See the detailed list of references below, especially references under "Algorithmic and statistical perspectives on data problems."

Wed 09/23/09: Introduction to Randomized Algorithms for Matrices, and the Johnson-Lindenstrauss lemma

Lecture Notes: pdf tex scribed by Gourab Mukherjee & Ben Newhouse

Main References:
(*) Dasgupta and Gupta, "An elementary proof of a theorem of Johnson and Lindenstrauss"
(*) Achlioptas, "Database-friendly random projections: Johnson-Lindenstrauss with binary coins"

Mon 09/28/09: JL lemma, cont., random projection for low-rank approximation, and pass efficient models

Lecture Notes: pdf tex scribed by Meghana Vishvanath & Erik Goldman

Main References:
(*) Papadimitriou, Raghavan, Tamaki, and Vempala, "Latent semantic indexing: a probabilistic analysis"

Wed 09/30/09: Approximating matrix multiplication, and Sampling columns/rows for low-rank approximation

Lecture Notes: pdf tex scribed by Richa Bhayani & Daniel Chen.

Main References:
(*) Drineas, Kannan, and Mahoney, "Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication"
(*) Drineas, Kannan, and Mahoney, "Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix"

Mon 10/05/09: Approximating norms of random matrices, and Sampling elements for low-rank approximation

Lecture Notes: pdf tex scribed by Jacob Bien & Noah Youngs

Main References:
(*) Alon, Krivelevich and Vu, "On the Concentration of Eigenvalues of Random Symmetric Matrices"
(*) Achlioptas and McSherry, "Fast Computation of Low-Rank Matrix Approximations"

Wed 10/07/09: Approximating L2 regression, and relative-error low-rank matrix approximation

Lecture Notes: pdf tex scribed by Mengqiu Wang & Kshipra Bhawalkar

Main References:
(*) Drineas, Mahoney, Muthukrishnan, and Sarlos, "Faster Least Squares Approximation"
(*) Drineas, Mahoney, and Muthukrishnan, "Relative-Error CUR Matrix Decompositions"

Mon 10/12/09: Reproducing Kernel Hilbert Spaces and Kernel-based Learning Methods (1 of 2)

Lecture Notes: pdf tex scribed by David Fong & Ya Xu

Main References:
(*) Muller, Mika, Ratsch, Tsuda, and Scholkopf, "An Introduction to Kernel-Based Learning Algorithms"
(*) Daume, "From Zero to Reproducing Kernel Hilbert Spaces in Twelve Pages or Less"

Wed 10/14/09: Reproducing Kernel Hilbert Spaces and Kernel-based Learning Methods (2 of 2)

Lecture Notes: pdf tex scribed by Mark Wagner & Weidong Shao

Main References:
(*) Cortes and Vapnik, "Support-Vector Networks"
(*) Scholkopf, Smola, and Muller, "Nonlinear component analysis as a kernel eigenvalue problem"
(*) Scholkopf, Herbrich, Smola, and Williamson, "A Generalized Representer Theorem"

Mon 10/19/09: Spectral and Kernel-based Methods for Nonlinear Dimensionality Reduction (1 of 2)

Lecture Notes: pdf tex scribed by Gourab Mukherjee & Deyan Simeonov

Main References:
(*) Saul, Weinberger, Ham, Sha, and Lee, "Spectral methods for dimensionality reduction"
(*) Belkin and Niyogi, "Laplacian eigenmaps for dimensionality reduction and data representation"

Wed 10/21/09: Spectral and Kernel-based Methods for Nonlinear Dimensionality Reduction (2 of 2)

Lecture Notes: pdf tex scribed by Yunting Sun & Meghana Vishvanath

Main References:
(*) Ham, Lee, Mika, and Scholkopf, "A kernel view of the dimensionality reduction of manifolds"
(*) Bengio et al., "Learning Eigenfunctions Links Spectral Embedding and Kernel PCA"
(*) Belabbas and Wolfe, "On landmark selection and sampling in high-dimensional data analysis"

Mon 10/26/09: Expander Graphs in algorithms theory and in data applications (1 of 2)

Lecture Notes: pdf tex scribed by Richa Bhayani & Erik Goldman

Main References:
(*) Sections 2, 3, 4, and 13 of Hoory, Linial, and Wigderson, "Expander graphs and their applications"

Also worth skimimng:
(*) Linial, London, Rabinovich, "The geometry of graphs and some of its algorithmic applications"
(*) Gkantsidis, Mihail, and Saberi, "Conductance and congestion in power law graphs"
(*) Leskovec, Lang, Dasgupta, and Mahoney, "Statistical Properties of Community Structure in Large Social and Information Networks"

Wed 10/28/09: Expander Graphs in algorithms theory and in data applications (2 of 2)

Lecture Notes: pdf tex scribed by Bahman Bahmani & Daniel Chen

References:
Same readings as last time.

Mon 11/02/09: Class rescheduled to Fri.

Wed 11/04/09: Introduction to Graph Partitioning, including improvement/multiresolution, spectral, and flow-based methods

Lecture Notes: pdf tex scribed by Noah Youngs & Weidong Shao

Main References:
(*) Schaeffer, "Graph Clustering"

Fri 11/06/09: Global and Local Spectral Methods for clustering and partitioning graphs and data

Lecture Notes: pdf tex scribed by David Fong & Rajendra Shinde

Main References:
(*) Guattery and Miller, "On the Quality of Spectral Separators"
(*) Chung, "Four proofs of Cheeger inequality and graph partition algorithms"

Also worth skimming:
(*) Shi and Malik, "Normalized Cuts and Image Segmentation"
(*) von Luxburg, "A Tutorial on Spectral Clustering"
(*) Section 7 of Spielman (and Teng), "Fast Randomized Algorithms for Partitioning, Sparsification, and Solving Linear Systems"
(*) Andersen and Lang, "Communities from seed sets"

Mon 11/09/09: Flow-based Methods for clustering and partitioning graphs and data

Lecture Notes: pdf tex scribed by Jacon Bien & Ya Xu

Main References:
(*) Shmoys, "Cut problems and their application to divide-and-conquer"
(*) Andersen and Lang, "An algorithm for improving graph partitions"

Also worth skimming:
(*) Leighton and Rao, "Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms"
(*) Linial, London, Rabinovich, "The geometry of graphs and some of its algorithmic applications"

Wed 11/11/09: Partitionig Algorithms that Combine Spectral and Flow

Lecture Notes: pdf tex scribed by Kshipra Bhawalkar & Deyan Simeonov

Main References:
(*) Arora, Rao, and Vazirani, CACM article, "Geometry, flows, and graph-partitioning algorithms"

Also worth skimming:
(*) Khandekar, Rao, and Vazirani, "Graph partitioning using single commodity flows"
(*) Orecchia, Schulman, Vazirani, and Vishnoi, "On Partitioning Graphs via Single Commodity Flows"

Mon 11/16/09: Implementations, and Relationship with Multiplicative Update Algorithms, Boosting, and Ensemble Methods

Lecture Notes: pdf tex scribed by Mark Wagner & Yunting Sun

Main References:
(*) Arora, Hazan, and Kale, "The multiplicative weights update method: a meta algorithm and applications"
(*) Freund and Schapire, "Game Theory, On-line Prediction and Boosting"

Also worth skimming:
(*) Lang, Mahoney, and Orecchia, "Empirical Evaluation of Graph Partitioning Using Spectral Embeddings and Flow"
(*) Dietterich, "Ensemble Methods in Machine Learning"
(*) Freund and Schapire, "Adaptive game playing using multiplicative weights"

Mon 11/30/09: Data-motivated matrix factorizations (1 of 2)

Lecture Notes: pdf tex scribed by Meghana Vishvanath & Rajendra Shinde

Main References:
(*) d'Aspremont, El Ghaoui, Jordan, and Lanckriet, "A Direct Formulation for Sparse PCA Using Semidefinite Programming"
(*) Mahoney and Drineas, "CUR Matrix Decompositions for Improved Data Analysis"

Wed 12/02/09: Data-motivated matrix factorizations (2 of 2)

Lecture Notes: pdf tex scribed by Bahman Bahmani & Weidong Shao

Main References:
(*) Fazel, Hindi, and Boyd, "A Rank Minimization Heuristic with Application to Minimum Order System Approximation"
(*) Srebro, Rennie, and Jaakkola, "Maximum Margin Matrix Factorizations"

Also worth skimimng:
(*) Kulis, Sustik, and Dhillon, "Low-Rank Kernel Learning with Bregman Matrix Divergences"
(*) Bell and Koren, "Lessons from the Netflix Prize Challenge"

Detailed list of references:

Papers marked (*) should be read; they are either useful background or will provide the basis for the lectures. Other papers are provided for additional background; this may be useful for helping you to scribe up the lectures and also for initial pointers for your project.

You should be able to find all of these online, given the title and authors. If there are any you can't find relatively easily, please let the instructor or TA know and we will provide a copy or pointer.

A note on reading papers: Research papers (especially conference versions in computer science) have a tendency to be terse, hard-to-read, and error-prone. To make matters worse, papers in different areas often refer to similar concepts by very different names. A good way to start is to peel apart the paper like an onion: read the introduction, and maybe read the introduction of related papers subsequent to it, to get a general idea; then get a sense of the main technical results (e.g., the main theoretical or empirical claims); and then go into the details of what is proved and how it is proved (or of how the empirical claims are validated). It may be hard to understand every technical detail of certain papers you're reading, but with effort you should develop a good understanding of the main technical contributions, the gist of how they are proved or evaluated, and a detailed understanding of at least one or two results.

Introduction, background, and overview:
- Algorithmic and statistical perspectives on data problems:
  - Fayyad, Piatetsky-Shapiro, and Smyth, "From data mining to knowledge discovery in databases"
  - Smyth, "Data mining: data analysis on a grand scale?"
  - Donoho, "Aide-Memoire. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality"
  - Breiman, "Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)"
  - Lambert, "What use is statistics for massive data?"
  - Poggio and Smale, "The Mathematics of Learning: Dealing with Data"
Randomized algorithms for matrix problems:
- Background and overview for eigenstuff:
  - Deerwester, Dumais, Furnas, Landauer, and Harshman "Indexing by latent semantic analysis"
  - Kleinberg, "Authoritative sources in a hyperlinked environment"
  - Page, Brin, Motwani, and Winograd, "The PageRank Citation Ranking: Bringing Order to the Web"
  - (*) Berry, Drmac, and Jessup, "Matrices, vector spaces, and information retrieval"
  - Wall, Rechtsteiner, and Rocha, "Singular value decomposition and principal component analysis"
  - (*) Langville and Meyer, "A Survey of Eigenvector Methods of Web Information Retrieval"
- "Slow" and "Fast" Johnson-Lindenstrauss lemma:
  - Johnson and Lindenstrauss, "Extensions of Lipshitz mapping into Hilbert space"
  - Frankl and Maehara, "The Johnson-Lindenstrauss Lemma and the sphericity of some graphs"
  - Indyk and Motwani, "Approximate nearest neighbors: towards removing the curse of dimensionality"
  - (*) Dasgupta and Gupta, "An elementary proof of a theorem of Johnson and Lindenstrauss"
  - (*) Achlioptas, "Database-friendly random projections: Johnson-Lindenstrauss with binary coins"
  - Ailon and Chazelle, "Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform"
  - Matousek, "On variants of the Johnson--Lindenstrauss lemma"
- Random projection and fast low-rank matrix approximation
  - (*) Papadimitriou, Raghavan, Tamaki, and Vempala, "Latent semantic indexing: a probabilistic analysis"
  - Dasgupta, "Experiments with Random Projection"
  - Bingham and Mannila, "Random projection in dimensionality reduction: applications to image and text data"
  - Fradkin and Madigan, "Experiments with random projections for machine learning"
  - Liberty, Woolfe, Martinsson, Rokhlin, and Tygert, "Randomized algorithms for the low-rank approximation of matrices"
  - Rokhlin, Szlam, and Tygert, "A randomized algorithm for principal component analysis"
- Matrix multiplication and norm estimation:
  - Furedi and Komlos, "The eigenvalues of random symmetric matrices"
  - (*) Alon, Krivelevich, and Vu, "On the concentration of eigenvalues of random symmetric matrices"
  - (*) Drineas, Kannan, and Mahoney, "Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication"
  - Rudelson and Vershynin, "Sampling from large matrices: an approach through geometric functional analysis"
- Random sampling of columns and elements from a matrix for low-rank approximation:
  - Frieze, Kannan, and Vempala, "Fast Monte-Carlo Algorithms for Finding Low-Rank Approximations"
  - Drineas, Frieze, Kannan, Vempala, and Vinay, "Clustering large graphs via the singular value decomposition"
  - (*) Achlioptas and McSherry, "Fast Computation of Low-Rank Matrix Approximations"
  - (*) Drineas, Kannan, and Mahoney, "Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix"
- Sampling algorithms for L2 Regression:
  - Drineas, Mahoney, and Muthukrishnan, "Sampling Algorithms for $\ell_2$ Regression and Applications"
  - (*) Drineas, Mahoney, Muthukrishnan, and Sarlos, "Faster Least Squares Approximation"
  - Rokhlin and Tygert, "A fast randomized algorithm for overdetermined linear least-squares regression"
  - Avron, Maymounkov, and Toledo, "Blendenpik: Supercharging LAPACK's least-squares solver"
- Relative error low-rank matrix approximation
  - (*) Drineas, Mahoney, and Muthukrishnan, "Relative-Error CUR Matrix Decompositions"
  - Deshpande and Vempala, "Adaptive sampling and fast low-rank matrix approximation"
  - Sarlos, "Improved Approximation Algorithms for Large Matrices via Random Projections"
  - Har-Peled, "Low Rank Matrix Approximation in Linear Time"
Algorithmic approaches to graph partitioning problems:
- Background and overview for graph partitioning:
  - Pothen, "Graph partitioning algorithms with applications to scientific computing"
  - Karypis and Kumar, "A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs"
  - (*) Shi and Malik, "Normalized Cuts and Image Segmentation"
  - (*) Schaeffer, "Graph Clustering"
- Flow-based partitioning methods:
  - Leighton and Rao, "An Approximate Max-Flow Min-Cut Theorem for Uniform Multicommodity Flow Problems with Applications to Approximation Algorithms"
  - (*) Leighton and Rao, "Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms"
  - (*) Shmoys, "Cut problems and their application to divide-and-conquer"
  - (*) Andersen and Lang, "An algorithm for improving graph partitions"
- Spectral-based partitioning methods:
  - (*) Guattery and Miller, "On the Quality of Spectral Separators"
  - Guattery and Miller, "Graph Embeddings and Laplacian Eigenvalues"
  - Spielman and Teng, "Spectral partitioning works: Planar graphs and finite element meshes"
  - Lang, "Fixing two weaknesses of the Spectral Method"
  - (*) von Luxburg, "A Tutorial on Spectral Clustering"
- Combining spectral and flow-based methods:
  - (*) Arora, Rao, and Vazirani, "Geometry, flows, and graph-partitioning algorithms"
  - Arora, Hazan, and Kale, "${O}(\sqrt {\log n)}$ Approximation to SPARSEST CUT in $\tilde{O}(n^2)$ Time"
  - (*) Arora and Kale, "A combinatorial, primal-dual approach to semidefinite programs"
  - (*) Khandekar, Rao, and Vazirani, "Graph partitioning using single commodity flows"
  - Orecchia, Schulman, Vazirani, and Vishnoi, "On Partitioning Graphs via Single Commodity Flows"
  - (*) Lang, Mahoney, and Orecchia, "Empirical Evaluation of Graph Partitioning Using Spectral Embeddings and Flow"
- Local graph partitioning methods:
  - (*) Spielman (and Teng), "Fast Randomized Algorithms for Partitioning, Sparsification, and Solving Linear Systems"
  - Andersen, Chung, and Lang, "Local Graph Partitioning using PageRank Vectors"
  - Chung, "The heat kernel as the pagerank of a graph"
  - (*) Chung, "Four proofs of Cheeger inequality and graph partition algorithms"
  - (*) Andersen and Lang, "Communities from seed sets"
  - Andersen, "A Local Algorithm for Finding Dense Subgraphs"
- Embeddings and geometric structure related to graphs:
  - (*) Linial, London, Rabinovich, "The geometry of graphs and some of its algorithmic applications"
  - Aumann and Rabani, "An $O(\log k)$ Approximate Min-Cut Max-Flow Theorem and Approximation Algorithm"
  - Goemans and Williamson, "Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming"
  - Indyk, "Algorithmic Applications of Low-Distortion Geometric Embeddings"
  - Indyk and Matousek, "Low Distortion Embeddings of Finite Metric Spaces"
Connections to data analysis and machine learning applications:
- Algorithmic basics of kernels and machine learning:
  - Aronszajn on "Theory of reproducing kernels"
  - Scholkopf, "Statistical Learning and Kernel Methods"
  - (*) Muller, Mika, Ratsch, Tsuda, and Scholkopf, "An Introduction to Kernel-Based Learning Algorithms"
  - (*) Daume, "From Zero to Reproducing Kernel Hilbert Spaces in Twelve Pages or Less"
  - (*) Cortes and Vapnik, "Support-Vector Networks"
  - (*) Scholkopf, Smola, and Muller, "Nonlinear Component Analysis as a Kernel Eigenvalue Problem"
  - (*) Scholkopf, Herbrich, Smola, and Williamson, "A Generalized Representer Theorem"
- Basics of manifold-based machine learning:
  - (*) Saul, Weinberger, Ham, Sha, and Lee, "Spectral methods for dimensionality reduction"
  - Roweis and Saul, "Nonlinear dimensionality reduction by local linear embedding"
  - Tenenbaum, de Silva, and Langford, "A global geometric framework for nonlinear dimensionality reduction"
  - Kondor and Lafferty, "Diffusion kernels on graphs and other discrete structures"
  - (*) Belkin and Niyogi, "Laplacian eigenmaps for dimensionality reduction and data representation"
  - Coifman, Lafon, Lee, Maggioni, Nadler, Warner and Zucker, "Geometric diffusions as a tool for harmonic analysis and structure definition in data: Diffusion maps"
- Connections to kernels and eigenfunction computation:
  - (*) Ham, Lee, Mika, and Scholkopf, "A kernel view of the dimensionality reduction of manifolds"
  - Bengio, Paiement, Vincent, Delalleau, Le Roux, and Ouimet, "Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering"
  - Bengio, Vincent, and Paiement, "Spectral clustering and kernel PCA are learning eigenfunctions"
  - Williams and Seeger, "Using the Nystrom Method to Speed Up Kernel Machines"
  - Platt, "FastMap, MetricMap, and Landmark MDS are all Nystrom Algorithms"
  - (*) Bengio et al., "Learning Eigenfunctions Links Spectral Embedding and Kernel PCA"
- Applications of Nystrom-based methods:
  - Belabbas and Wolfe, "Spectral methods in machine learning and new strategies for very large datasets"
  - (*) Belabbas and Wolfe, "On landmark selection and sampling in high-dimensional data analysis"
  - Zhang, Tsang, and Kwok, "Improved Nystrom low-rank approximation and error analysis"
  - Talwalkar, Kumar, and Rowley, "Large-scale manifold learning"
  - Kumar, Mohri, and Talwalkar, "On sampling-based approximate spectral decomposition"
  - Fowlkes, Belongie, Chung, and Malik, "Spectral Grouping Using the Nystrom Method"
- Applications of low-rank matrix approximation:
  - Smola and Scholkopf, "Sparse Greedy Matrix Approximation for Machine Learning"
  - Fine and Scheinberg, "Efficient SVM Training Using Low-Rank Kernel Representations"
  - Belabbas and Wolfe, "Fast Low-Rank Approximation for Covariance Matrices"
  - Bach and Jordan, "Predictive low-rank decomposition for kernel methods"
  - Abernethy, Bach, Evgeniou, and Vert, "Low-rank matrix factorization with attributes"
  - Koren, "Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model"
- Connections to spectral clustering in machine learning:
  - Weiss, "Segmentation using eigenvectors: a unifying view"
  - Meila and Shi, "A Random Walks View of Spectral Segmentation"
  - Ng, Jordan, and Weiss, "On Spectral Clustering: Analysis and an algorithm"
  - Ding, He, Zha, Gu, and Simon, "A Min-max Cut Algorithm for Graph Partitioning and Data Clustering"
  - Kannan, Vempala, and Vetta, "On clusterings: Good, bad and spectral"
  - von Luxburg, Belkin, and Bousquet, "Consistency of spectral clustering"
- Expanders for algorithms and real networks:
  - (*) Hoory, Linial, and Wigderson, "Expander graphs and their applications"
  - Nielsen, "Introduction to expander graphs"
  - Mihail and Papadimitriou, "On the Eigenvalue Power Law"
  - Chung, Lu, and Vu, "The spectra of random graphs with given expected degrees"
  - (*) Gkantsidis, Mihail, and Saberi, "Conductance and congestion in power law graphs"
  - (*) Leskovec, Lang, Dasgupta, and Mahoney, "Statistical Properties of Community Structure in Large Social and Information Networks"
Novel data-motivated matrix factorizations:
- Sparse PCA:
  - Zou, Hastie, and Tibshirani, "Sparse principal component analysis"
  - (*) d'Aspremont, El Ghaoui, Jordan, and Lanckriet, "A Direct Formulation for Sparse PCA Using Semidefinite Programming"
  - d'Aspremont, Bach, and El Ghaoui, "Optimal Solutions for Sparse Principal Component Analysis"
- Maximum margin methods:
  - (*) Srebro, Rennie, and Jaakkola, "Maximum Margin Matrix Factorizations"
  - Rennie and Srebro, "Fast maximum margin matrix factorization for collaborative prediction"
  - DeCoste, "Collaborative prediction using ensembles of Maximum Margin Matrix Factorizations"
- Matrix rank minimization:
  - (*) Fazel, Hindi, and Boyd, "A Rank Minimization Heuristic with Application to Minimum Order System Approximation"
  - Recht, Fazel, and Parrilo, "Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization"
  - Chandrasekaran, Sanghavi, Parrilo, and Willsky, "Rank-Sparsity Incoherence for Matrix Decomposition"
- Bregmann divergence methods:
  - Dhillon and Tropp, "Matrix Nearness Problems with Bregman Divergences"
  - Banerjee, Merugu, Dhillon, Ghosh, "Clustering with Bregman Divergences"
  - (*) Kulis, Sustik, and Dhillon, "Low-Rank Kernel Learning with Bregman Matrix Divergences"
  - Tsuda, Rasch, and Warmuth, "Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection"
- CUR and related decompositions:
  - (*) Mahoney and Drineas, "CUR Matrix Decompositions for Improved Data Analysis"
  - Stewart, "Four algorithms for the efficient computation of truncated QR approximations to a sparse matrix"
  - Berry, Pulatova, and Stewart, "Computing Sparse Reduced-Rank Approximations to Sparse Matrices"
  - Goreinov, Tyrtyshnikov, and Zamarashkin, "A Theory of Pseudoskeleton Approximations"
  - Goreinov and Tyrtyshnikov, "The Maximum-Volume Concept in Approximation by Low-Rank Matrices"
Relationship to numerical, statistical, large-scale computational issues:
- Multiplicative weights update method and online learning
- Boosting and ensemble methods:
- Regularization methods:
- Some more statistical issues:
- Some more numerical issues:
  - Cline and Dhillon, "Computation of the Singular Value Decomposition"
  - Saad and van der Vorst, "Iterative solution of linear systems in the 20th century"
  - Golub and van der Vorst, "Eigenvalue computation in the 20th century
  - Watkins, "QR-like algorithms for eigenvalue problems"
  - Baboulin, Dongarra, and Tomov, "Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures"
  - Agullo, et al., "Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects"
  - Grigori, Demmel, and Xiang, "Communication avoiding Gaussian elimination"
  - Ballard, Demmel, Holtz, and Schwartz, "Minimizing Communication in Linear Algebra"
  - Toledo, "A survey of out-of-core algorithms in numerical linear algebra"
- Some large-scale computational and implementation issues:
  - Bell, Koren, and Volinsky, "Modeling Relationships at Multiple Scales to Improve Accuracy of Large Recommender Systems"
  - (*) Bell and Koren, "Lessons from the Netflix Prize Challenge"
  - Dean and Ghemawat, "MapReduce: simplified data processing on large clusters"
  - Chu, Kim, Lin, Yu, Bradski, Ng, and Olukotun, "Map-Reduce for Machine Learning on Multicore"
  - Ranger, Raghuraman, Penmetsa, Bradski, and Kozyrakis, "Evaluating MapReduce for Multi-core and Multiprocessor Systems"
  - Pavlo, Paulson, Rasin, Abadi, DeWitt, Madden, and Stonebraker, "A comparison of approaches to large-scale data analysis"
  - Becla and Wang, "Lessons Learned from Managing a Petabyte"
  - Bryant, "Data-Intensive Supercomputing: The case for DISC"
  - Song, Chen, Bai, Lin, and Chang, "Parallel Spectral Clustering"
  - Chang, Zhu, Wang, Bai, Li, Qiu, and Cui, "PSVM: Parallelizing Support Vector Machines on Distributed Computers"
  - Lumsdaine, Gregor, Berry, and Hendrickson, "Challenges in Parallel Graph Processing"
  - Bader, "Petascale Computing for Large-Scale Graph Problems"
  - Jacobs, "The pathologies of big data"