|
Michael Mahoney's new home page
|
Basic info
I am in the math department at Stanford University.
You can reach me via electronic mail:
mmahoney
is the username, and
cs dot stanford dot edu
follows the at symbol.
(Of course, my gmail address is still valid.)
Also, some other (somewhat outdated) information can be found at my
old web page
at Yale computer science.
|
Recent items of interest:
We will be running MMDS again!
MMDS 2012
will take place on the campus of Stanford University on July 10-13, 2012.
Details will be available soon!
My monograph on "Randomized Algorithms for Matrices and Data" is available
in NOW's "Foundations and Trends in Machine Learning" series, and it is also
available on the arXiv here.
The overview chapter on
"Algorithmic and Statistical Perspectives on Large-Scale Data Analysis" is
finally on the arXiv
here; the book in which it will
appear is in press; and a video of the associated talk is
here.
Click
here
for information (including the slides and video!) on the Tutorial on
"Geometric Tools for Identifying Structure in Large Social and Information
Networks," given originally at ICML10 and KDD10 and subsequently at many
other places. (The slides are also linked to below.)
Recent teaching:
Fall 2009:
CS369M: Algorithms for Massive Data Set Analysis
We have made public many of the networks we have used in our "Community
Structure" papers.
Click
here
for the networks and for related information;
older networks are still
here.
Our PNAS paper on "CUR Matrix Decompositions for Improved Data Analysis"
has appeared as
Proc. Natl. Acad. Sci. USA, 106, 697-702 (2009).
Please email me if you can't access a copy from the PNAS web site.
Research interests
-
Algorithmic and statistical aspects of modern large-scale data analysis.
-
Design and analysis of algorithms for matrix, graph, and regression problems.
-
Applications to analytics on large social and information networks.
-
Applications to DNA microarray and single nucleotide polymorphism data.
-
Randomized algorithms for large linear algebra problems.
Much of my current research focuses on
geometric network analysis, i.e., using algorithms with a geometric
or statistical flavor to analyze the structure and dynamics of large
informatics graphs;
developing approximate computation and regularization methods for large
informatics graphs; and
applications to community detection, clustering, and information dynamics in
large social and information networks.
I am also continuing work on matrix sampling algorithms,
applications to DNA single nucleotide polymorphism (SNP) data,
and large-scale statistical data analysis more generally.
In the last few years, I developed and analyzed algorithms for large matrix,
graph, and regression problems, and I applied these and related tools to the
statistical data analysis of extremely large scientific and Internet data
sets.
For example, I worked on
large-scale web analytics, machine learning, and query log analysis;
applications of graph partitioning algorithms to clustering and
community identification; and
applications of randomized algorithms to hyperspectral medical image data,
DNA microarray data, and DNA SNP data.
In the past, I have also worked on developing and analyzing Monte Carlo
algorithms for performing useful computations on extremely
large matrices,
e.g., the additive-error and relative-error CUR matrix decompositions.
Recently, this work led to improved algorithms for two classical linear
algebra problems: a faster algorithm for the overconstrained least squares
approximation problem, and a better algorithm for the problem of choosing
the best set of exactly k columns from a matrix.
Past research has also included work in computational statistical mechanics
on the
development and analysis of the
TIP5P
model of liquid water, as well as work in both computational and experimental
biophysics on proteins and protein-nucleic acid interactions.
MMDS Workshops
We started the
MMDS Workshops on
"Algorithms for Modern Massive Data Sets"
to address
algorithmic and statistical
challenges in modern large-scale statistical data analysis.
-
We will be running MMDS again!
MMDS 2012
will take place on the campus of Stanford University on July 10-13, 2012.
Details will be available soon!
-
MMDS 2010
took place on the campus of Stanford University on June 15-18, 2010.
MMDS 2010 addressed computation in large-scale scientific and internet data
applications more generally.
See
the MMDS web page
for details, including articles and all the speaker overheads!
-
MMDS 2008
took place on June 25-28, 2008.
MMDS 2008 grew out of our expectation for what
the algorithmic and statistical
foundations of
large-scale
data analysis
should look like a generation from now.
Click
here
for an article that appeared in SIGKDD Explorations and
SIAM News about the meeting.
-
MMDS 2006
took place on June 21-24, 2006.
MMDS 2006 was originally motivated by the complementary perspectives
brought by numerical linear algebra and theoretical computer science to
matrix algorithms in large-scale data applications.
Click
here
for an article in SIAM News about the meeting.
These MMDS meetings generated intense interdisciplinary interest and were a
big success --- so keep an eye out for future MMDSs!
Publications
-
On the Hyperbolicity of Small-World Networks and Tree-Like Graphs,
-
W. Chen, W. Fang, G. Hu, M. W. Mahoney,
-
Technical Report, Preprint: arXiv:1201.1717 (2012)
(arXiv),
-
Stochastic Dimensionality Reduction for K-means Clustering,
-
C. Boutsidis, A. Zouzias, M. W. Mahoney, and P. Drineas,
-
Technical Report, Preprint: arXiv:1110.2897 (2011)
(arXiv),
-
Regularized Laplacian Estimation and Fast Eigenvector Approximation,
-
P. O. Perry and M. W. Mahoney,
-
Technical Report, Preprint: arXiv:1110.1757 (2011)
(arXiv),
-
Proceedings of the 2011 NIPS Conference.
-
LSRN: A Parallel Iterative Solver for Strongly Over- or Under-Determined Systems,
-
X. Meng, M. A. Saunders, and M. W. Mahoney,
-
Technical Report, Preprint: arXiv:1109.5981 (2011)
(arXiv).
-
Fast approximation of matrix coherence and statistical leverage,
-
P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff,
-
Technical Report, Preprint: arXiv:1109.3843 (2011)
(arXiv).
-
Localization on low-order eigenvectors of data matrices,
-
M. Cucuringu and M. W. Mahoney,
-
Technical Report, Preprint: arXiv:1109.1355 (2011)
(arXiv).
-
Efficient Genomewide Selection of PCA-Correlated tSNPs for Genotype Imputation,
-
A. Javed, P. Drineas, M. W. Mahoney, and P. Paschou,
-
Annals of Human Genetics, 75, 707-722 (2011)
(
pdf).
-
Randomized Algorithms for Matrices and Data,
-
M. W. Mahoney,
-
Technical Report, Preprint: arXiv:1104.5557 (2011)
(arXiv),
-
NOW Publishers,
Foundations and Trends in Machine Learning,
Volume 3, Issue 2, 2011
(now),
-
Computation in Large-Scale Scientific and Internet Data Applications is a Focus of MMDS 2010,
-
M. W. Mahoney,
-
Technical Report, Preprint: arXiv:1012.4231 (2010)
(arXiv),
-
Appeared in
SIGKDD Explorations,
SIGACT News,
ASA-SCGN Newsletter,
and IMS Bulletin.
-
CUR from a Sparse Optimization Viewpoint,
-
J. Bien, Y. Xu, and M. W. Mahoney,
-
Technical Report, Preprint: arXiv:1011.0413 (2010)
(arXiv),
-
Proceedings of the 2010 NIPS Conference
(ps,
pdf).
-
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis,
-
M. W. Mahoney,
-
Technical Report, Preprint: arXiv:1010.1609 (2010)
(arXiv),
-
To appear in:
Combinatorial Scientific Computing,
U. Naumann and O. Schenk, eds.,
2012.
-
Implementing regularization implicitly via approximate eigenvector computation,
-
M. W. Mahoney and L. Orecchia,
-
Technical Report, Preprint: arXiv:1010.0703 (2010)
(arXiv),
-
Proceedings of the 28th ICML Conference, 121-128 (2011)
(
pdf).
-
Approximating Higher-Order Distances Using Random Projections,
-
P. Li, M. W. Mahoney, and Y. She,
-
Proceedings of the 26th UAI Conference, 312-321 (2010)
(ps,
pdf).
-
Effective Resistances, Statistical Leverage, and Applications to Linear Equation Solving,
-
P. Drineas and M. W. Mahoney,
-
Technical Report, Preprint: arXiv:1005.3097 (2010)
(arXiv).
-
Empirical Comparison of Algorithms for Network Community Detection,
-
J. Leskovec, K. J. Lang, and M. W. Mahoney,
-
Technical Report, Preprint: arXiv:1004.3539 (2010)
(arXiv),
-
Proc. 19-th International WWW, 631-640 (2010)
(ps,
pdf).
-
A Local Spectral Method for Graphs: with Applications to Improving Graph
Partitions and Exploring Data Graphs Locally,
-
M. W. Mahoney, L. Orecchia, and N. K. Vishnoi,
-
Technical Report, Preprint: arXiv:0912.0681 (2009)
(arXiv).
-
Unsupervised Feature Selection for the k-means Clustering Problem,
-
C. Boutsidis, M. W. Mahoney, and P. Drineas,
-
Proceedings of the 2009 NIPS Conference
(ps,
pdf).
-
Learning with Spectral Kernels and Heavy-Tailed Data,
-
M. W. Mahoney and H. Narayanan,
-
Technical Report, Preprint: arXiv:0906.4539 (2009)
(arXiv).
-
Empirical Evaluation of Graph Partitioning Using Spectral Embeddings and Flow,
-
K. J. Lang, M. W. Mahoney, and L. Orecchia,
-
Proc. 8-th International SEA, 197-208 (2009)
(ps,
pdf).
-
CUR Matrix Decompositions for Improved Data Analysis,
-
M. W. Mahoney and P. Drineas,
-
Proc. Natl. Acad. Sci. USA, 106, 697-702 (2009)
(ps,
pdf).
-
An Improved Approximation Algorithm for the Column Subset Selection Problem,
-
C. Boutsidis, M. W. Mahoney, and P. Drineas,
-
Technical Report, Preprint: arXiv:0812.4293 (2008)
(arXiv),
-
Proc. 20-th Annual SODA, 968-977 (2009)
(ps,
pdf).
-
Algorithmic and Statistical Challenges in Modern Large-Scale Data Analysis are the Focus of MMDS 2008
-
M. W. Mahoney, L.-H. Lim, and G. E. Carlsson
-
Technical Report, Preprint: arXiv:0812.3702 (2008)
(arXiv),
-
Appeared in
SIGKDD Explorations
(ps,
pdf),
SIAM News
(ps,
pdf),
and
ASA-SCGN Newsletter
(ps,
pdf),
and abridged versions appeared in IMS Bulletin
(ps,
pdf)
and AmStat News.
-
Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters,
-
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney,
-
Technical Report, Preprint: arXiv:0810.1355 (2008)
(arXiv),
-
Internet Mathematics, 6(1), 29-123 (2009).
-
Unsupervised Feature Selection for Principal Components Analysis,
-
C. Boutsidis, M. W. Mahoney, and P. Drineas,
-
Proc. 14-th Annual SIGKDD, 61-69 (2008)
(ps,
pdf).
-
Statistical Properties of Community Structure in Large Social and Information Networks,
-
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney,
-
Proc. 17-th International WWW, 695-704 (2008)
(ps,
pdf).
-
Faster Least Squares Approximation,
-
P. Drineas, M. W. Mahoney, S. Muthukrishnan, and T. Sarlos,
-
Technical Report, Preprint: arXiv:0710.1435 (2007)
(arXiv),
-
Numerische Mathematik, 117, 219-249 (2011).
-
PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations,
-
P. Paschou, E. Ziv, E. G. Burchard, S. Choudhry, W. Rodriguez-Cintron, M. W. Mahoney, and P. Drineas,
-
PLoS Genetics, 3, 1672-1686 (2007)
(ps,
pdf).
-
Relative-Error CUR Matrix Decompositions,
-
P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
-
Technical Report, Preprint: arXiv:0708.3696 (2007)
(arXiv),
-
SIAM J. Matrix Analysis and Applications, 30, 844-881 (2008)
(ps,
pdf).
-
Feature Selection Methods for Text Classification,
-
A. Dasgupta, P. Drineas, B. Harb, V. Josifovski, and M. W. Mahoney,
-
Proc. 13-th Annual SIGKDD, 230-239 (2007)
(ps,
pdf).
-
Sampling Algorithms and Coresets for Lp Regression,
-
A. Dasgupta, P. Drineas, B. Harb, R. Kumar, and M. W. Mahoney,
-
Technical Report, Preprint: arXiv:0707.1714 (2007)
(arXiv),
-
Proc. 19-th Annual SODA, 932-941 (2008)
(ps,
pdf),
-
SIAM J. Computing, 38, 2060-2078 (2009)
(ps,
pdf).
-
Web Information Retrieval and Linear Algebra Algorithms,
-
A. Frommer, M. W. Mahoney, and D. B. Szyld (Eds.),
-
Proc. of Dagstuhl Seminar 07071, (2007)
(web).
-
Intra- and interpopulation genotype reconstruction from tagging SNPs,
-
P. Paschou, M. W. Mahoney, A. Javed, J. R. Kidd, A. J. Pakstis, S. Gu, K. K. Kidd, and P. Drineas,
-
Genome Research, 17(1), 96-107 (2007)
(ps,
pdf).
-
Bridging the Gap Between Numerical Linear Algebra, Theoretical Computer Science, and Data Applications,
-
G. H. Golub, M. W. Mahoney, P. Drineas, and L.-H. Lim,
-
SIAM News 39:8 October 2006
(ps,
pdf).
-
Randomized Algorithms for Matrices and Massive Data Sets,
-
P. Drineas and M. W. Mahoney,
-
Proc. 32-nd Annual VLDB, 1269 (2006)
(ps,
pdf).
-
Subspace Sampling and Relative-Error Matrix Approximation: Column-Row-Based Methods,
-
P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
-
Proc. 14-th Annual ESA, 304-314 (2006)
(ps,
pdf).
-
Subspace Sampling and Relative-Error Matrix Approximation: Column-Based Methods,
-
P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
-
Proc. 10-th Annual RANDOM, 316-326 (2006)
(ps,
pdf).
-
Tensor-CUR Decompositions For Tensor-Based Data,
-
M. W. Mahoney, M. Maggioni, and P. Drineas,
-
Proc. 12-th Annual SIGKDD, 327-336 (2006)
(ps,
pdf),
-
SIAM J. Matrix Analysis and Applications, 30, 957-987 (2008)
(ps,
pdf).
-
Polynomial Time Algorithm for Column-Row-Based Relative-Error Low-Rank Matrix Approximation,
-
P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
-
Technical Report, DIMACS TR 2006-04 March 2006
(ps,
pdf).
-
Sampling Algorithms for L2 Regression and Applications,
-
P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
-
Proc. 17-th Annual SODA, 1127-1136 (2006)
(ps,
pdf).
-
A Randomized Algorithm for a Tensor-Based Generalization of the Singular Value Decomposition,
-
P. Drineas and M. W. Mahoney,
-
Technical Report, YALEU/DCS/TR-1327, June 2005
(ps,
pdf),
-
Linear Algebra and its Applications, 420, 553-571 (2007)
(ps,
pdf).
-
On the Nystrom Method for Approximating a Gram Matrix for Improved Kernel-Based Learning,
-
P. Drineas and M. W. Mahoney,
-
Technical Report, YALEU/DCS/TR-1319, April 2005
(ps,
pdf),
-
Proc. 18-th Annual COLT, 323-337 (2005)
(ps,
pdf),
-
J. Machine Learning Research, 6, 2153-2175 (2005)
(ps,
pdf).
-
Sampling Sub-problems of Heterogeneous Max-Cut Problems and Approximation Algorithms,
-
P. Drineas, R. Kannan, and M. W. Mahoney,
-
Technical Report, YALEU/DCS/TR-1283, April 2004
(ps,
pdf),
-
Proc. 22-nd Annual STACS, 57-68 (2005)
(ps,
pdf),
-
Random Structures and Algorithms, 32:3, 307-333 (2008)
(ps,
pdf).
-
Fast Monte Carlo Algorithms for Matrices III: Computing an Efficient Approximate Decomposition of a Matrix,
-
P. Drineas, R. Kannan, and M. W. Mahoney,
-
Technical Report, YALEU/DCS/TR-1271, February 2004
(ps,
pdf),
-
SIAM J. Computing, 36, 184-206 (2006)
(ps,
pdf).
-
Fast Monte Carlo Algorithms for Matrices II: Computing Low-Rank Approximations to a Matrix,
-
P. Drineas, R. Kannan, and M. W. Mahoney,
-
Technical Report, YALEU/DCS/TR-1270, February 2004
(ps,
pdf),
-
SIAM J. Computing, 36, 158-183 (2006)
(ps,
pdf).
-
Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication,
-
P. Drineas, R. Kannan, and M. W. Mahoney,
-
Technical Report, YALEU/DCS/TR-1269, February 2004
(ps,
pdf),
-
SIAM J. Computing, 36, 132-157 (2006)
(ps,
pdf).
-
Rapid Mixing of Several Markov Chains for a Hard-Core Model,
-
R. Kannan, M. W. Mahoney, and R. Montenegro,
-
Proc. 14-th Annual ISAAC, 663-675 (2003)
(ps,
pdf).
-
Quantum, Intramolecular Flexibility, and Polarizability Effects on the Reproduction of the Density Anomaly of Liquid Water by Simple Potential Functions,
-
M. W. Mahoney and W. L. Jorgensen,
-
J. Chem. Phys., 115, 10758-10768 (2001)
(ps,
pdf).
-
Rapid Estimation of Electronic Degrees of Freedom in Monte Carlo Calculations for Polarizable Models of Liquid Water,
-
M. W. Mahoney and W. L. Jorgensen,
-
J. Chem. Phys., 114, 9337-9349 (2001)
(ps,
pdf).
-
Diffusion Constant of the TIP5P Model of Liquid Water,
-
M. W. Mahoney and W. L. Jorgensen,
-
J. Chem. Phys., 114, 363-366 (2001)
(ps,
pdf).
-
A Five-Site Model for Liquid Water and the Reproduction of the Density Anomaly by Rigid, Nonpolarizable Potential Functions,
-
M. W. Mahoney and W. L. Jorgensen,
-
J. Chem. Phys., 112, 8910-8922 (2000)
(ps,
pdf).
-
Repression and Activation of Promoter-Bound RNA Polymerase Activity by Gal Repressor,
-
H. E. Choy, R. R. Hanger, T. Aki, M. Mahoney, K. Murakami, A. Ishihama, and S. Adhya,
-
J. Mol. Biol. 272: 293-300, 1997
(ps,
pdf).
-
Discrete Representations of the Protein C-alpha Chain,
-
X. F. de la Cruz, M. W. Mahoney, and B. K. Lee,
-
Fold. & Des. 2: 223-234, 1997
(ps,
pdf).
Talks and presentations
Several recent talks:
Extracting insight from large networks: implications of small-scale and large-scale structure
(1 hour version)
(pdf,
ppt)
Approximate computation and implicit regularization in large-scale data analysis
(short vrsn)
(pdf,
ppt)
Fast Approximation of Matrix Coherence and Statistical Leverage
(pdf,
ppt)
Linear Algebra and Machine Learning of Large Informatics Graphs
(pdf,
ppt)
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
(pdf,
ppt)
Community structure in large social and information networks
(newer)
(pdf,
ppt)
Statistical leverage and improved matrix algorithms
(newer and long)
(pdf,
ppt)
Several tutorial presentations:
Geometric Tools for Identifying Structure in Large Social and Information Networks
(1.5 hr version at SAMSI Opening Workshop 2010)
(pdf,
ppt)
Geometric Tools for Identifying Structure in Large Social and Information Networks
(3 hr version at ICML 2010 and KDD 2010)
(pdf,
ppt)
(The pdf file in four pieces:
here,
here,
here,
and
here.)
Randomized Algorithms for Matrices and Massive Data Sets
(at SIAM-SDM06 2006 and VLDB 2006)
(ppt)
Randomized Algorithms for Matrices and Massive Data Sets
(at ACM-SIGKDD 2005)
(ppt)
Several other older talks:
Looking for clusters in your data ... in theory and in practice
(pdf,
ppt)
Implementing regularization implicitly via approximate eigenvector computation
(pdf,
ppt)
Geometric Network Analysis Tools
(talk from MMDS 2010)
(pdf,
ppt)
Approximation Algorithms as Experimental Probes of Informatics Graphs
(pdf,
ppt)
Community structure in large social and information networks
(talk from MMDS 2008)
(pdf,
ppt)
Community structure in large social and information networks
(older)
(pdf,
ppt)
Statistical leverage and improved matrix algorithms
(older and short)
(pdf,
ppt)
Sampling algorithms and core-sets for Lp regression and applications
(pdf,
ppt)
CUR Matrix Decompositions for Improved Data Analysis (talk from MMDS 2006)
(pdf,
ppt)
A Relative-Error CUR Decomposition for Matrices and Its Data Applications
(pdf,
ppt)
Sampling Algorithms for L2 Regression and Applications
(talk from SODA 2006)
(pdf,
ppt)
Approximating a Gram Matrix for Improved Kernel-Based Learning
(talk from COLT 2005)
(ps,
pdf)
Fast Monte Carlo Algorithms for Matrix Operations and Massive Data Set Analysis
(newer)
(
pdf,
ppt)
Fast Monte Carlo Algorithms for Matrix Operations and Massive Data Set Analysis
(older)
(
pdf)
CUR Matrix Decomposition with Applications to Algorithm Design and Massive Data Set Analysis
(pdf)
Fast Monte Carlo Algorithms for Massive Data Sets and Approximating Max-Cut
(ps,
pdf)
The TIP5P Water talk:
The Computational Statistical Mechanics of Simple Models of Liquid Water
(
pdf)
Videos of several talks:
"Approximate computation and implicit regularization in large-scale data analysis,"
or click
here
(at the Workshop on Beyond Worst-Case Analysis, September, 2011).
"Linear Algebra and Machine Learning of Large Informatics Graphs"
(at the NIPS Workshops, December, 2010).
"Geometric Tools for Identifying Structure in Large Social and Information Networks"
(90 minute version, tutorial at SAMSI Opening workshop, August 2010).
"Geometric Tools for Graph Mining of Large Social and Information Networks"
(3 hour version, tutorial at KDD 2010, July 2010).
"Community Structure in Large Social and Information Networks"
(at the Newton Institute in Cambridge, June 2010).
"Algorithmic and Statistical Perspectives on Large-Scale Data Analysis"
(at the SF Bay Area DM-SIG Meeting, February 2010).
"Community Structure in Large Social and Information Networks,"
in avi
or mpg (note: slow to download),
(at the 2009 IIT Kanpur Processing Massive Data Sets Workshop, December 2009).
"Statistical Leverage and Improved Matrix Algorithms"
(at the 2009 ICML Workshops, June 2009).
"Sampling Algorithms and Coresets for Lp Regression and Applications,"
in mpg (note: slow to download),
(at the 2006 IIT Kanpur Data Streams Workshop, December 2006).
"Fast Monte Carlo Algorithms for Matrix Operations and Massive Data Set Analysis"
(at the 2005 IPAM Summer School, July 2005).
My old web page
at Yale computer science has additional (somewhat outdated) information;
but you should click on it (or Google search for it) if you are interested in
extremely large homes.
|