I am a post-doctoral researcher at Stanford University with Sylvia Plevritis (Center for Computational Cystems Biology/Radiology) and Jonathan Pritchard (Genetics) having previously worked with Daphne Koller prior to her move to Coursera. I did my PhD with Zoubin Ghahramani in the Machine Learning group of the Cambridge University Engineering Department. I was the Roger Needham Scholar at Wolfson College, funded by Microsoft Research. My undergraduate degree comprised two years of Physics before switching to Engineering to complete an MEng with Zoubin. I took the MSc Bioinformatics and Systems Biology at Imperial College in 2007/8.
My research involves both the development of novel machine learning methods and their application to data analysis problems in biology. I am collaborating with Ventana-Roche on automated breast cancer prognosis from digitised histological and immunohistochemical slides.
During my PhD I was a part-time developer of Infer.NET. I wrote a blog post about some of the features we added in Infer.NET 2.4, see here.
I was involved with running the Cambridge University Statistics Clinic. At Stanford I help out with SMACC: Statistical, Mathematical, and Computational Consulting.
You can download my CV here. Here's a video of me and my friend Johan falling off cliffs on skis in Flaine, France.Working papers/Under submission
- Po-Yuan Tung, John D Blischak, Chiaowen Hsiao, David A Knowles, Jonathan E Burnett, Jonathan K Pritchard, Yoav Gilad
Batch effects and the effective design of single-cell gene expression studies bioRxiv - Yang I Li, David A Knowles, Jonathan K Pritchard (2016)
LeafCutter: Annotation-free quantification of RNA splicing bioRxiv github - David A. Knowles (2015)
Stochastic gradient variational Bayes for gamma approximating distributions. Preprint on arXiv code - David A Knowles, Joe R Davis, Anil Raj, Xiaowei Zhu, James B Potash, Myrna M Weissman, Jianxin Shi, Douglas F. Levinson, Sara Mostafavi, Stephen B Montgomery, Alexis Battle (2015)
Allele-specific expression reveals interactions between genetic variation and environment. Preprint on bioRxiv github - Tim Salimans, David A. Knowles (2014)
On Using Control Variates with Stochastic Approximation for Variational Bayes and its Connection to Stochastic Linear Regression
http://arxiv.org/abs/1401.1022 - Konstantina Palla*, David A. Knowles* and Zoubin Ghahramani (2013)
A dependent partition-valued process for multitask clustering and time evolving network modelling http://arxiv.org/abs/1303.3265
Publications
- Yang I. Li, Bryce van de Geijn, Anil Raj, David A. Knowles, Allegra A. Petti, David Golan, Yoav Gilad, Jonathan K. Pritchard (2016)
RNA splicing is a primary link between genetic variation and disease.
Science url - Kimberly R. Kukurba, Princy Parsana, Kevin S. Smith, Zachary Zappala, David A. Knowles, Marie-Julie Fave, Xin Li, Xiaowei Zhu, James B. Potash, Myrna M. Weissman, Jianxin Shi, Anshul Kundaje, Douglas F. Levinson, Philip Awadalla, Sara Mostafavi, Alexis Battle, Stephen B. Montgomery (2016)
Impact of the X chromosome and sex on regulatory variation.
Genome Research url bioRxiv - Joe R. Davis, Laure Fresard, David A. Knowles, Mauro Pala, Carlos D. Bustamante, Alexis Battle, Stephen B. Montgomery (2016)
An Efficient Multiple-Testing Adjustment for eQTL Studies that Accounts for Linkage Disequilibrium between Variants
American Journal of Human Genetics url code - Amar Shah, David A. Knowles, Zoubin Ghahramani
An Empirical Study of Stochastic Variational Algorithms for the Beta Bernoulli Process
ICML 2015 pdf - Kien Nguyen, Joerg Bredno, David A. Knowles (2015)
Using contextual information to classify nuclei in histology images
International Symposium on Biomedical Imaging (ISBI) 2015 url pdf - Xin Li, Alexis Battle, Konrad J. Karczewski, Zach Zappala, David A. Knowles, Kevin S. Smith, Kim R. Kukurba, Eric Wu, Noah Simon, Stephen B. Montgomery (2014)
Transcriptome Sequencing of a Large Human Family Identifies the Impact of Rare Noncoding Variants
American Journal of Human Genetics (Featured Article) url - Konstantina Palla, David A. Knowles and Zoubin Ghahramani
A reversible infinite HMM using normalised random measures
ICML 2014. url - Creighton Heaukulani, David A. Knowles and Zoubin Ghahramani
Beta Diffusion Trees pdf
ICML 2014. - Kukurba, K.R. Zhang, R., Li, X., Smith, K.S., Knowles, D.A., Tan, M.H., Piskol, R., Lek, M., Snyder, M., MacArthur, D.G., Li, J.B., Montgomery, S.B. (2014)
Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues
PLoS Genetics. url - David A. Knowles and Zoubin Ghahramani (2014)
Pitman-Yor Diffusion Trees for Bayesian hierarchical clustering preprint IEEE TPAMI Special Issue on Bayesian Nonparametrics - Konstantina Palla, David A. Knowles and Zoubin Ghahramani (2014)
Relational learning and network modelling using infinite latent attribute models
IEEE TPAMI Special Issue on Bayesian Nonparametrics doi pdf - Tim Salimans, David A. Knowles (2013)
Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression url
Bayesian Analysis [Winner of the Lindley Prize!]
- Daniel Glass, Ana Vinuela, Mathew N Davies, Adaikalavan Ramasamy, Leopold Parts, David A. Knowles, Andrew A Brown, Asa K Hedman, Kerrin S Small, Alfonso Buil, Elin Grundberg, Alexandra C Nica, Paola Di Meglio, Frank O Nestle, Mina Ryten, The UK Brain Expression consortium, Muther Consortium, Richard Durbin, Mark I McCarthy, Panagiotis Deloukas, Emmanouil T Dermitzakis, Mike E Weale, Veronique Bataille and Tim D Spector (2013).
Gene expression changes with age in skin, adipose tissue, blood and brain. url
Genome Biology
- Novi Quadrianto, Viktoriia Sharmanska, David A. Knowles, Zoubin Ghahramani (2013).
The Supervised IBP: Neighbourhood Preserving Infinite Latent Feature Models. pdf
UAI 2013
- Elin Grundberg, Kerrin S Small, Åsa K Hedman, Alexandra C Nica, Alfonso Buil, Sarah Keildson, Jordana T Bell, Tsun-Po Yang, Eshwar Meduri, Amy Barrett, James Nisbett, Magdalena Sekowska, Alicja Wilk, So-Youn Shin, Daniel Glass, Mary Travers, Josine L Min, Sue Ring, Karen Ho, Gudmar Thorleifsson, Augustine Kong, Unnur Thorsteindottir, Chrysanthi Ainali, Antigone S Dimas, Neelam Hassanali, Catherine Ingle, David A. Knowles, Maria Krestyaninova, Christopher E Lowe, Paola Di Meglio, Stephen B Montgomery, Leopold Parts, Simon Potter, Gabriela Surdulescu, Loukia Tsaprouni, Sophia Tsoka, Veronique Bataille, Richard Durbin, Frank O Nestle, Stephen O'Rahilly, Nicole Soranzo, Cecilia M Lindgren, Krina T Zondervan, Kourosh R Ahmadi, Eric E Schadt, Kari Stefansson, George Davey Smith, Mark I McCarthy, Panos Deloukas, Emmanouil T Dermitzakis, Tim D Spector & The Multiple Tissue Human Expression Resource (MuTHER) Consortium (2012)
Mapping cis- and trans-regulatory effects across multiple tissues in twins url
Nature Genetics
- Konstantina Palla*, David A. Knowles* and Zoubin Ghahramani (2012)
A nonparametric variable clustering model pdf code
NIPS 2012
- Konstantina Palla, David A. Knowles and Zoubin Ghahramani (2012)
An Infinite Latent Attribute Model for Network Data pdf
ICML 2012
- Andrew Wilson, David A. Knowles and Zoubin Ghahramani (2012)
Gaussian Process Regression Networks pdf code
ICML 2012
- David A. Knowles and Tom Minka (2011)
Non-conjugate Variational Message Passing for Multinomial and Binary Regression pdf
NIPS 2011
Supplementary material: pdf - Mehregan Movassagh, Mun-Kit Choy, David A. Knowles, Lina Cordeddu, Syed Haider, Thomas Down, Lee Siggens, Ana Vujic, Ilenia Simeoni, Chris Penkett, Martin Goddard, Pietro Lio, Martin Bennett, Roger Foo (2011)
Distinct epigenomic features in human cardiomyopathy url
Circulation, American Heart Association - David A. Knowles and Zoubin Ghahramani (2011)
Pitman-Yor Diffusion Trees pdf slides
Supplementary material: Sampling from the PYDT
27th Conference on Uncertainty in Artificial Intelligence (UAI) - David A. Knowles, Jurgen Van Gael, and Zoubin Ghahramani (2011)
Message Passing Algorithms for the Dirichlet Diffusion Tree pdf talk
ICML 2011
Extended version: pdf - Cornelia Schone, Anne Venner, David A. Knowles, Mahesh M Karnani, Denis Burdakov (2011)
Dichotomous cellular properties of mouse orexin/hypocretin neurons url
The Journal of Physiology - David A. Knowles and Zoubin Ghahramani (2011)
Nonparametric Bayesian Sparse Factor Models with application to Gene Expression modelling pdf url code
Annals of Applied Statistics - Daniel Glass, Leopold Parts, David A. Knowles, Abraham Aviv, and Tim D. Spector (2010)
No Correlation Between Childhood Maltreatment and Telomere Length url
In Biol Psychiatry. 2010 September 15; 68(6): Pages 21-22 - Finale Doshi*, David A. Knowles*, Shakir Mohamed* and Zoubin Ghahramani (2009)
Large Scale Non-parametric Inference: Data Parallelisation in the Indian Buffet Process. pdf
In NIPS 2009, 7-12 December 2009, Vancouver, BC, Canada. - Knowles, D. A. and Ghahramani, Z. (2007)
Infinite Sparse Factor Analysis and Infinite Independent Components Analysis. pdf
In 7th International Conference on Independent Component Analysis and Signal Separation (ICA 2007).
Lecture Notes in Computer Science Series (LNCS). Springer.
*joint first author.
Workshop papers/conference abstracts
- Yang Li, Bryce van de Geijn, Allegra Petti, Anil Raj, David A. Knowles, John Blischak, Yoav Gilad, Jonathan Pritchard (2015).
The effects of human genetic variation on the gene regulatory cascade.
American Society of Human Genetics 65th Annual Meeting - David A. Knowles, Stanley Ho, Kien Nguyen, Don Morris, Anthony Magliocco, Anindya Sarkar, Daphne Koller, Sylvia Plevritis, Srinivas Chukka, Michael Barnes (2015).
Machine Learning-based Prognostication of Breast Cancer Recurrence using Tissue Slide Features.
Pathology Visions Winner: Best Poster in Image Analysis!
- David A. Knowles, Joe R. Davis, Stephen B. Montgomery, Alexis Battle (2015).
Detecting gene-by-environment interactions using allele specific expression.
The Biology of Genomes Meeting (CSHL)
- David A. Knowles, Stanley Ho, Kien Nguyen, Don Morris, Anthony Magliocco, Anindya Sarkar, Daphne Koller, Srinivas Chukka, Michael Barnes (2014)
Machine learning-based prognostication of breast cancer recurrence using tissue slide features from H&E and immunohistochemically stained slides.
San Antonio Breast Cancer Symposium
- Emily K. Tsang, Xin Li, Vanessa Anaya, Konrad J. Karczewski, David A. Knowles, Kevin S. Smith, Stepehn B. Montgomery (2014).
Dissecting the genetic regulation of exosome RNA cargo in a large family.
American Society of Human Genetics 64th Annual Meeting - J.R. Davis, D.A. Knowles, S.B. Montgomery, A. Battle (2014)
Rare variation and the genomic context of allele-specific expression. American Society of Human Genetics 64th Annual Meeting
- David A. Knowles, Alexis Battle, Daphne Koller (2013)
Discovering latent cancer characteristics predictive of drug sensitivity.
RECOMB/ISCB Conference on Regulatory & Systems Genomics (selected for oral presentation) - Alexis Battle*, David A. Knowles*, Sara Mostafavi, Xiaowei Zhu, James B. Potash, Myrna M. Weissman, Courtney McCormick, Christian D. Haudenschild, Kenneth B. Beckman, Jianxin Shi, Rui Mei, Alexander E. Urban, Douglas F. Levinson, Daphne Koller, Stephen B. Montgomery (2013)
The relationship between common environmental and genetic effects on human gene splicing and expression.
American Society of Human Genetics (ASHG) Annual Meeting - David A. Knowles, Leopold Parts, Daniel Glass and John M. Winn
Inferring a measure of physiological age from multiple ageing related phenotypes. paper video
To appear at the NIPS workshop: From Statistical Genetics to Predictive Models in Personalized Medicine (NIPS PM 2011) - David A. Knowles, Leopold Parts, Daniel Glass and John M. Winn (2010)
Modeling skin and ageing phenotypes using latent variable models in Infer.NET. paper poster
Poster presented at: Predictive Models in Personalized Medicine Workshop, NIPS 2010, 6-11 December 2010, Vancouver, BC, Canada. - Knowles, D. and Holmes, S. (2009)
Statistical tools for ultra-deep pyrosequencing of fast evolving viruses. pdf video slides
Presented at: Computational Biology Workshop, NIPS 2009, 7-12 December 2009, Vancouver, BC, Canada.
Reports/Theses
- Bayesian non-parametric models and inference for sparse and hierarchical latent structure (2012) pdf
PhD Thesis, University of Cambridge
Supervisor: Zoubin Ghahramani - Serial and Parallel Inference in Sparse Nonparametric Latent Factor Models applied to Gene Expression Modeling (2009) pdf
PhD First Year Report, Department of Engineering, University of Cambridge
Supervisor: Zoubin Ghahramani - Statistical tools for ulta-deep pyrosequencing of fast evolving viruses (2008) pdf
MSc Bioinformatics and Systems Biology, Imperial College London, Individual Project
Supervisor: Professor Susan Holmes, Stanford University - SBML-ABC: a package for data simulation, parameter inference and model selection, Group Report (2008) pdf
MSc Bioinformatics and Systems Biology, Imperial College London, Group Project
Supervisor: Professor Michael Stumpf - Infinite Independent Components Analysis (2007) pdf
MEng Information Engineering, Cambridge University, 4th year project
Supervisor: Professor Zoubin Ghahramani - Real Time Continuous Curvature Path Planner for an Autonomous Vehicle in an Urban Environment (2006) pdf
Summer Undergraduate Research Fellowship, Caltech. I was a member of Team Caltech, an entry into the DARPA Urban Challenge
Supervisor: Professor Richard Murray
Presentations
- Detecting gene-by-environment interactions using allele specific expression. The Biology of Genomes 2015 (image credit @AlexCagan)
- Properties of Bayesian nonparametric models and priors over trees. Guest lecture as part of Matt Hoffman's STAT300 class, summer 2013.
- Diffusion trees as priors. This was a talk I gave about the Dirichlet diffusion tree and Pitman Yor diffusion tree at Collegio Carlo Alberto.
- Inferring an individual's "physiological" age from multiple ageing-related phenotypes
I gave a talk at the Cambridge Statistics Initiative Special One-Day Meeting, which you can watch here. I also presented this work at the NIPS 2011 Personalised Medicine workshop: paper video - Variational methods for nonparametric Bayesian models
I gave a brief presentation at Microsoft Research summarising some attempts to use variational inference in nonparametric, particularly Dirichlet Process based, models. The slides are here.
Code
The C#/Infer.NET code for Gaussian Process Regression Networks is on github.
The Matlab code for nonparametric sparse factor analysis is available here.
The MCMC sampler for the Dirichlet Process Variable Clustering model is available on Google code at code.google.com/p/dpvc/
Please note that this is research code and as such is provided with no warranty and limited to no support.
Misc
Lagrangian duality
My last journal club was on convex optimisation. I think I finally got my head round Lagrangian duality, and hopefully came up with a reasonably intuitive explanation. My focus is on intuition rather than rigor, and is based almost entirely on Boyd and Vandenberghe's tome. I thought I should write this up while it's still fresh, so here you go:Lagrangian Duality for Dummies
Binomial p-values
Following my work on 454 pyrosequencing error rates with Professor Holmes, I was asked about how to calculate a p-value for comparing two draws from a Binomial distribution to test the hypothesis that the number of substitutions seen in the sample is significantly greater than the number of substitutions seen in the control. There is actually no need to use the Poisson approximation, and the Binomial distribution very naturally takes care of varying coverage. I explain my approach here.Emacs for Dummies
A few notes on using Emacs for Unix non-gurus.Contact details
E-mail: My surname followed by 84 at gmail.comPhysical location: Office 133, Gates Building
Snail mail: Computer Science Department, 353 Serra Mall, Stanford University, Stanford, CA 94305-9025, USA
You can view my availability here.