|
Michael Mahoney's new home page
Basic info
I am currently
in the math department
at Stanford University.
You can reach me via electronic mail:
mmahoney
is the username, and
cs dot stanford dot edu
follows the
at
symbol.
(Of course, my gmail address is still valid.)
Also, some other (somewhat outdated) information can be found at my
old web page
at Yale computer science.
Recent items of interest:
We are going to run MMDS again!
MMDS 2010
will take place on the campus of Stanford University on June 15-18, 2010.
Details coming soon!
Recent teaching:
Fall 2009:
CS369M: Algorithms for Massive Data Set Analysis
We have made public many of the networks we have used in our "Community
Structure" papers.
Click
here
for the networks and for related information.
The older networks are still
here.
Our PNAS paper on "CUR Matrix Decompositions for Improved Data Analysis"
has appeared as
Proc. Natl. Acad. Sci. USA, 106, 697-702 (2009).
Please email me if you can't access a copy from the PNAS web site.
Research interests
-
Algorithmic and statistical aspects of modern large-scale data analysis.
-
Design and analysis of algorithms for matrix, graph, and regression problems.
-
Statistical data analysis in large-scale scientific and Internet applications.
-
Applications to the analysis of large social and information networks.
-
Applications to DNA microarray and single nucleotide polymorphism data.
Much of my current research focuses on
geometric network analysis, i.e., using algorithms with a geometric
or statistical flavor to analyze the structure and dynamics of large
informatics graphs;
developing approximate computation and regularization methods for large
informatics graphs; and
applications to community detection, clustering, and information dynamics in
large social and information networks.
I am also continuing work on matrix sampling algorithms,
applications to DNA single nucleotide polymorphism (SNP) data,
and large-scale statistical data analysis more generally.
In the last few years, I developed and analyzed algorithms for large matrix,
graph, and regression problems, and I applied these and related tools to the
statistical data analysis of extremely large scientific and Internet data
sets.
For example, I worked on
large-scale web analytics, machine learning, and query log analysis;
applications of graph partitioning algorithms to clustering and
community identification; and
applications of randomized algorithms to hyperspectral medical image data,
DNA microarray data, and DNA SNP data.
In the past, I have also worked on developing and analyzing Monte Carlo
algorithms for performing useful computations on extremely
large matrices,
e.g., the additive-error and relative-error CUR matrix decompositions.
Recently, this work led to improved algorithms for two classical linear
algebra problems: a faster algorithm for the overconstrained least squares
approximation problem, and a better algorithm for the problem of choosing
the best set of exactly k columns from a matrix.
Past research has also included work in computational statistical mechanics
on the
development and analysis of the
TIP5P
model of liquid water, as well as work in both computational and experimental
biophysics on
proteins and protein-nucleic acid interactions.
MMDS Workshops
We started the
MMDS Workshops on
"Algorithms for Modern Massive Data Sets"
to address
algorithmic and statistical
challenges in modern large-scale statistical data analysis.
-
We are going to run MMDS again!
MMDS 2010
will take place on the campus of Stanford University on June 15-18, 2010.
Details coming soon!
-
MMDS 2008
took place on June 25-28, 2008.
MMDS 2008 grew out of our expectation for what
the algorithmic and statistical
foundations of
large-scale
data analysis
should look like a generation from now.
Click
here
for an article that appeared in SIGKDD Explorations and
SIAM News about the meeting.
-
MMDS 2006
took place on June 21-24, 2006.
MMDS 2006 was originally motivated by the complementary perspectives
brought by numerical linear algebra and theoretical computer science to
matrix algorithms in large-scale data applications.
Click
here
for an article in SIAM News about the meeting.
These MMDS meetings generated intense interdisciplinary interest and were a
big success --- so keep an eye out for future MMDSs!
Publications
-
Empirical Comparison of Algorithms for Network Community Detection,
-
J. Leskovec, K. J. Lang, and M. W. Mahoney,
-
Proc. 19-th International WWW, 000-000 (2010)
-
A Spectral Algorithm for Improving Graph Partitions,
-
M. W. Mahoney, L. Orecchia, and N. K. Vishnoi,
-
Technical Report, Preprint: arXiv:0912.0681 (2009)
(arXiv).
-
Unsupervised Feature Selection for the k-means Clustering Problem,
-
C. Boutsidis, M. W. Mahoney, and P. Drineas,
-
Proceedings of the 2009 NIPS Conference.
-
Learning with Spectral Kernels and Heavy-Tailed Data,
-
M. W. Mahoney and H. Narayanan,
-
Technical Report, Preprint: arXiv:0906.4539 (2009)
(arXiv).
-
Empirical Evaluation of Graph Partitioning Using Spectral Embeddings and Flow,
-
K. J. Lang, M. W. Mahoney, and L. Orecchia,
-
Proc. 8-th International SEA, 197-208 (2009).
-
CUR Matrix Decompositions for Improved Data Analysis,
-
M. W. Mahoney and P. Drineas,
-
Proc. Natl. Acad. Sci. USA, 106, 697-702 (2009).
-
An Improved Approximation Algorithm for the Column Subset Selection Problem,
-
C. Boutsidis, M. W. Mahoney, and P. Drineas,
-
Technical Report, Preprint: arXiv:0812.4293 (2008)
(arXiv).
-
Proc. 20-th Annual SODA, 968-977 (2009)
(ps,
pdf).
-
Journal version submitted for publication.
-
Algorithmic and Statistical Challenges in Modern Large-Scale Data Analysis are the Focus of MMDS 2008
-
M. W. Mahoney, L.-H. Lim, and G. E. Carlsson
-
Technical Report, Preprint: arXiv:0812.3702 (2008)
(arXiv),
-
Appeared in
SIGKDD Explorations
(ps,
pdf),
SIAM News
(ps,
pdf),
and
ASA-SCGN Newsletter
(ps,
pdf),
and abridged versions appeared in IMS Bulletin
(ps,
pdf)
and AmStat News.
-
Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters,
-
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney,
-
Technical Report, Preprint: arXiv:0810.1355 (2008)
(arXiv).
-
Unsupervised Feature Selection for Principal Components Analysis,
-
C. Boutsidis, M. W. Mahoney, and P. Drineas,
-
Proc. 14-th Annual SIGKDD, 61-69 (2008)
(ps,
pdf).
-
Statistical Properties of Community Structure in Large Social and Information Networks,
-
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney,
-
Proc. 17-th International WWW, 695-704 (2008)
(ps,
pdf).
-
Faster Least Squares Approximation,
-
P. Drineas, M. W. Mahoney, S. Muthukrishnan, and T. Sarlos,
-
Technical Report, Preprint: arXiv:0710.1435 (2007)
(arXiv),
-
Journal version submitted for publication.
-
PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations,
-
P. Paschou, E. Ziv, E. G. Burchard, S. Choudhry, W. Rodriguez-Cintron, M. W. Mahoney, and P. Drineas,
-
PLoS Genetics, 3, 1672-1686 (2007)
(ps,
pdf).
-
Relative-Error CUR Matrix Decompositions,
-
P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
-
Technical Report, Preprint: arXiv:0708.3696 (2007)
(arXiv),
-
SIAM J. Matrix Analysis and Applications, 30, 844-881 (2008)
(ps,
pdf).
-
Feature Selection Methods for Text Classification,
-
A. Dasgupta, P. Drineas, B. Harb, V. Josifovski, and M. W. Mahoney,
-
Proc. 13-th Annual SIGKDD, 230-239 (2007)
(ps,
pdf).
-
Sampling Algorithms and Coresets for Lp Regression,
-
A. Dasgupta, P. Drineas, B. Harb, R. Kumar, and M. W. Mahoney,
-
Technical Report, Preprint: arXiv:0707.1714 (2007)
(arXiv),
-
Proc. 19-th Annual SODA, 932-941 (2008)
(ps,
pdf),
-
SIAM J. Computing, 38, 2060-2078 (2009)
(ps,
pdf).
-
Web Information Retrieval and Linear Algebra Algorithms,
-
A. Frommer, M. W. Mahoney, and D. B. Szyld (Eds.),
-
Proc. of Dagstuhl Seminar 07071, (2007)
(web).
-
Intra- and interpopulation genotype reconstruction from tagging SNPs,
-
P. Paschou, M. W. Mahoney, A. Javed, J. R. Kidd, A. J. Pakstis, S. Gu, K. K. Kidd, and P. Drineas,
-
Genome Research, 17(1), 96-107 (2007)
(ps,
pdf).
-
Bridging the Gap Between Numerical Linear Algebra, Theoretical Computer Science, and Data Applications,
-
G. H. Golub, M. W. Mahoney, P. Drineas, and L.-H. Lim,
-
SIAM News 39:8 October 2006
(ps,
pdf).
-
Randomized Algorithms for Matrices and Massive Data Sets,
-
P. Drineas and M. W. Mahoney,
-
Proc. 32-nd Annual VLDB, 1269 (2006)
(ps,
pdf).
-
Subspace Sampling and Relative-Error Matrix Approximation: Column-Row-Based Methods,
-
P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
-
Proc. 14-th Annual ESA, 304-314 (2006)
(ps,
pdf).
-
Subspace Sampling and Relative-Error Matrix Approximation: Column-Based Methods,
-
P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
-
Proc. 10-th Annual RANDOM, 316-326 (2006)
(ps,
pdf).
-
Tensor-CUR Decompositions For Tensor-Based Data,
-
M. W. Mahoney, M. Maggioni, and P. Drineas,
-
Proc. 12-th Annual SIGKDD, 327-336 (2006)
(ps,
pdf),
-
SIAM J. Matrix Analysis and Applications, 30, 957-987 (2008)
(ps,
pdf).
-
Polynomial Time Algorithm for Column-Row-Based Relative-Error Low-Rank Matrix Approximation,
-
P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
-
Technical Report, DIMACS TR 2006-04 March 2006
(ps,
pdf).
-
Sampling Algorithms for L2 Regression and Applications,
-
P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
-
Proc. 17-th Annual SODA, 1127-1136 (2006)
(ps,
pdf).
-
A Randomized Algorithm for a Tensor-Based Generalization of the Singular Value Decomposition,
-
P. Drineas and M. W. Mahoney,
-
Technical Report, YALEU/DCS/TR-1327, June 2005
(ps,
pdf),
-
Linear Algebra and its Applications, 420, 553-571 (2007)
(ps,
pdf).
-
On the Nystrom Method for Approximating a Gram Matrix for Improved Kernel-Based Learning,
-
P. Drineas and M. W. Mahoney,
-
Technical Report, YALEU/DCS/TR-1319, April 2005
(ps,
pdf),
-
Proc. 18-th Annual COLT, 323-337 (2005)
(ps,
pdf),
-
J. Machine Learning Research, 6, 2153-2175 (2005)
(ps,
pdf).
-
Sampling Sub-problems of Heterogeneous Max-Cut Problems and Approximation Algorithms,
-
P. Drineas, R. Kannan, and M. W. Mahoney,
-
Technical Report, YALEU/DCS/TR-1283, April 2004
(ps,
pdf),
-
Proc. 22-nd Annual STACS, 57-68 (2005)
(ps,
pdf),
-
Random Structures and Algorithms, 32:3, 307-333 (2008)
(ps,
pdf).
-
Fast Monte Carlo Algorithms for Matrices III: Computing an Efficient Approximate Decomposition of a Matrix,
-
P. Drineas, R. Kannan, and M. W. Mahoney,
-
Technical Report, YALEU/DCS/TR-1271, February 2004
(ps,
pdf),
-
SIAM J. Computing, 36, 184-206 (2006)
(ps,
pdf).
-
Fast Monte Carlo Algorithms for Matrices II: Computing Low-Rank Approximations to a Matrix,
-
P. Drineas, R. Kannan, and M. W. Mahoney,
-
Technical Report, YALEU/DCS/TR-1270, February 2004
(ps,
pdf),
-
SIAM J. Computing, 36, 158-183 (2006)
(ps,
pdf).
-
Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication,
-
P. Drineas, R. Kannan, and M. W. Mahoney,
-
Technical Report, YALEU/DCS/TR-1269, February 2004
(ps,
pdf),
-
SIAM J. Computing, 36, 132-157 (2006)
(ps,
pdf).
-
Rapid Mixing of Several Markov Chains for a Hard-Core Model,
-
R. Kannan, M. W. Mahoney, and R. Montenegro,
-
Proc. 14-th Annual ISAAC, 663-675 (2003)
(ps,
pdf).
-
Quantum, Intramolecular Flexibility, and Polarizability Effects on the Reproduction of the Density Anomaly of Liquid Water by Simple Potential Functions,
-
M. W. Mahoney and W. L. Jorgensen,
-
J. Chem. Phys., 115, 10758-10768 (2001)
(ps,
pdf).
-
Rapid Estimation of Electronic Degrees of Freedom in Monte Carlo Calculations for Polarizable Models of Liquid Water,
-
M. W. Mahoney and W. L. Jorgensen,
-
J. Chem. Phys., 114, 9337-9349 (2001)
(ps,
pdf).
-
Diffusion Constant of the TIP5P Model of Liquid Water,
-
M. W. Mahoney and W. L. Jorgensen,
-
J. Chem. Phys., 114, 363-366 (2001)
(ps,
pdf).
-
A Five-Site Model for Liquid Water and the Reproduction of the Density Anomaly by Rigid, Nonpolarizable Potential Functions,
-
M. W. Mahoney and W. L. Jorgensen,
-
J. Chem. Phys., 112, 8910-8922 (2000)
(ps,
pdf).
-
Repression and Activation of Promoter-Bound RNA Polymerase Activity by Gal Repressor,
-
H. E. Choy, R. R. Hanger, T. Aki, M. Mahoney, K. Murakami, A. Ishihama, and S. Adhya,
-
J. Mol. Biol. 272: 293-300, 1997
(ps,
pdf).
-
Discrete Representations of the Protein C-alpha Chain,
-
X. F. de la Cruz, M. W. Mahoney, and B. K. Lee,
-
Fold. & Des. 2: 223-234, 1997
(ps,
pdf).
Talks and presentations
Overheads for several recent talks:
Community structure in large social and information networks
(newer)
(pdf,
ppt)
Statistical leverage and improved matrix algorithms
(newer and long)
(pdf,
ppt)
Overheads for several tutorial presentations:
Randomized Algorithms for Matrices and Massive Data Sets
(at SIAM-SDM06 2006 and VLDB 2006)
(ppt)
Randomized Algorithms for Matrices and Massive Data Sets
(at ACM-SIGKDD 2005)
(ppt)
Overheads for several other older talks:
Approximation Algorithms as Experimental Probes of Informatics Graphs
(pdf,
ppt)
Community structure in large social and information networks
(talk from MMDS 2008)
(pdf,
ppt)
Community structure in large social and information networks
(older)
(pdf,
ppt)
Statistical leverage and improved matrix algorithms
(older and short)
(pdf,
ppt)
Sampling algorithms and core-sets for Lp regression and applications
(pdf,
ppt)
CUR Matrix Decompositions for Improved Data Analysis (talk from MMDS 2006)
(pdf,
ppt)
A Relative-Error CUR Decomposition for Matrices and Its Data Applications
(pdf,
ppt)
Sampling Algorithms for L2 Regression and Applications
(talk from SODA 2006)
(pdf,
ppt)
Approximating a Gram Matrix for Improved Kernel-Based Learning
(talk from COLT 2005)
(ps,
pdf)
Fast Monte Carlo Algorithms for Matrix Operations and Massive Data Set Analysis
(newer)
(
pdf,
ppt)
Fast Monte Carlo Algorithms for Matrix Operations and Massive Data Set Analysis
(older)
(
pdf)
The CUR Matrix Decomposition with Applications to Algorithm Design and Massive Data Set Analysis
(pdf)
Fast Monte Carlo Algorithms for Massive Data Sets and Approximating Max-Cut
(ps,
pdf)
Overheads for the TIP5P water talk:
The Computational Statistical Mechanics of Simple Models of Liquid Water
(
pdf)
A few videos of talks:
"Fast Monte Carlo Algorithms for Matrix Operations and Massive Data Set Analysis"
at 2005 IPAM Summer School.
"Statistical Leverage and Improved Matrix Algorithms" at 2009 ICML Workshops.
My
old web page
at Yale computer science has additional (somewhat outdated) information;
but you should click on it (or Google search for it) if you are interested in
extremely large homes.
|