I am a post-doctoral researcher at Stanford University with Sylvia Plevritis (Center for Computational Cystems Biology/Radiology) and Jonathan Pritchard (Genetics) having previously worked with Daphne Koller prior to her move to Coursera. I did my PhD with Zoubin Ghahramani in the Machine Learning group of the Cambridge University Engineering Department. I was the Roger Needham Scholar at Wolfson College, funded by Microsoft Research. My undergraduate degree comprised two years of Physics before switching to Engineering to complete an MEng with Zoubin. I took the MSc Bioinformatics and Systems Biology at Imperial College in 2007/8.
My research involves both the development of novel machine learning methods and their application to data analysis problems in biology. I am collaborating with Ventana-Roche on automated breast cancer prognosis from digitised histological and immunohistochemical slides.here. Here's a video of me and my friend Johan falling off cliffs on skis in Flaine, France.
Working papers/Under submission
- Po-Yuan Tung, John D Blischak, Chiaowen Hsiao, David A Knowles, Jonathan E Burnett, Jonathan K Pritchard, Yoav Gilad
Batch effects and the effective design of single-cell gene expression studies bioRxiv
- Yang I Li, David A Knowles, Jonathan K Pritchard (2016)
LeafCutter: Annotation-free quantification of RNA splicing bioRxiv github
- David A. Knowles (2015)
Stochastic gradient variational Bayes for gamma approximating distributions. Preprint on arXiv code
- David A Knowles, Joe R Davis, Anil Raj, Xiaowei Zhu, James B Potash, Myrna M Weissman, Jianxin Shi, Douglas F. Levinson, Sara Mostafavi, Stephen B Montgomery, Alexis Battle (2015)
Allele-specific expression reveals interactions between genetic variation and environment. Preprint on bioRxiv github
- Tim Salimans, David A. Knowles (2014)
On Using Control Variates with Stochastic Approximation for Variational Bayes and its Connection to Stochastic Linear Regression
- Konstantina Palla*, David A. Knowles* and Zoubin Ghahramani (2013)
A dependent partition-valued process for multitask clustering and time evolving network modelling http://arxiv.org/abs/1303.3265
- Yang I. Li, Bryce van de Geijn, Anil Raj, David A. Knowles, Allegra A. Petti, David Golan, Yoav Gilad, Jonathan K. Pritchard (2016)
RNA splicing is a primary link between genetic variation and disease.
- Kimberly R. Kukurba, Princy Parsana, Kevin S. Smith, Zachary Zappala, David A. Knowles, Marie-Julie Fave, Xin Li, Xiaowei Zhu, James B. Potash, Myrna M. Weissman, Jianxin Shi, Anshul Kundaje, Douglas F. Levinson, Philip Awadalla, Sara Mostafavi, Alexis Battle, Stephen B. Montgomery (2016)
Impact of the X chromosome and sex on regulatory variation.
Genome Research url bioRxiv
- Joe R. Davis, Laure Fresard, David A. Knowles, Mauro Pala, Carlos D. Bustamante, Alexis Battle, Stephen B. Montgomery (2016)
An Efficient Multiple-Testing Adjustment for eQTL Studies that Accounts for Linkage Disequilibrium between Variants
American Journal of Human Genetics url code
- Amar Shah, David A. Knowles, Zoubin Ghahramani
An Empirical Study of Stochastic Variational Algorithms for the Beta Bernoulli Process
ICML 2015 pdf
- Kien Nguyen, Joerg Bredno, David A. Knowles (2015)
Using contextual information to classify nuclei in histology images
International Symposium on Biomedical Imaging (ISBI) 2015 url pdf
- Xin Li, Alexis Battle, Konrad J. Karczewski, Zach Zappala, David A. Knowles, Kevin S. Smith, Kim R. Kukurba, Eric Wu, Noah Simon, Stephen B. Montgomery (2014)
Transcriptome Sequencing of a Large Human Family Identifies the Impact of Rare Noncoding Variants
American Journal of Human Genetics (Featured Article) url
- Konstantina Palla, David A. Knowles and Zoubin Ghahramani
A reversible infinite HMM using normalised random measures
ICML 2014. url
- Creighton Heaukulani, David A. Knowles and Zoubin Ghahramani
Beta Diffusion Trees pdf
- Kukurba, K.R. Zhang, R., Li, X., Smith, K.S., Knowles, D.A., Tan, M.H., Piskol, R., Lek, M., Snyder, M., MacArthur, D.G., Li, J.B., Montgomery, S.B. (2014)
Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues
PLoS Genetics. url
- David A. Knowles and Zoubin Ghahramani (2014)
Pitman-Yor Diffusion Trees for Bayesian hierarchical clustering preprint IEEE TPAMI Special Issue on Bayesian Nonparametrics
- Konstantina Palla, David A. Knowles and Zoubin Ghahramani (2014)
Relational learning and network modelling using infinite latent attribute models
IEEE TPAMI Special Issue on Bayesian Nonparametrics doi pdf
- Tim Salimans, David A. Knowles (2013)
Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression url
Bayesian Analysis [Winner of the Lindley Prize!]
- Daniel Glass, Ana Vinuela, Mathew N Davies, Adaikalavan Ramasamy, Leopold Parts, David A. Knowles, Andrew A Brown, Asa K Hedman, Kerrin S Small, Alfonso Buil, Elin Grundberg, Alexandra C Nica, Paola Di Meglio, Frank O Nestle, Mina Ryten, The UK Brain Expression consortium, Muther Consortium, Richard Durbin, Mark I McCarthy, Panagiotis Deloukas, Emmanouil T Dermitzakis, Mike E Weale, Veronique Bataille and Tim D Spector (2013).
Gene expression changes with age in skin, adipose tissue, blood and brain. url
- Novi Quadrianto, Viktoriia Sharmanska, David A. Knowles, Zoubin Ghahramani (2013).
The Supervised IBP: Neighbourhood Preserving Infinite Latent Feature Models. pdf
- Elin Grundberg, Kerrin S Small, Åsa K Hedman, Alexandra C Nica, Alfonso Buil, Sarah Keildson, Jordana T Bell, Tsun-Po Yang, Eshwar Meduri, Amy Barrett, James Nisbett, Magdalena Sekowska, Alicja Wilk, So-Youn Shin, Daniel Glass, Mary Travers, Josine L Min, Sue Ring, Karen Ho, Gudmar Thorleifsson, Augustine Kong, Unnur Thorsteindottir, Chrysanthi Ainali, Antigone S Dimas, Neelam Hassanali, Catherine Ingle, David A. Knowles, Maria Krestyaninova, Christopher E Lowe, Paola Di Meglio, Stephen B Montgomery, Leopold Parts, Simon Potter, Gabriela Surdulescu, Loukia Tsaprouni, Sophia Tsoka, Veronique Bataille, Richard Durbin, Frank O Nestle, Stephen O'Rahilly, Nicole Soranzo, Cecilia M Lindgren, Krina T Zondervan, Kourosh R Ahmadi, Eric E Schadt, Kari Stefansson, George Davey Smith, Mark I McCarthy, Panos Deloukas, Emmanouil T Dermitzakis, Tim D Spector & The Multiple Tissue Human Expression Resource (MuTHER) Consortium (2012)
Mapping cis- and trans-regulatory effects across multiple tissues in twins url
- Konstantina Palla*, David A. Knowles* and Zoubin Ghahramani (2012)
A nonparametric variable clustering model pdf code
- Konstantina Palla, David A. Knowles and Zoubin Ghahramani (2012)
An Infinite Latent Attribute Model for Network Data pdf
- Andrew Wilson, David A. Knowles and Zoubin Ghahramani (2012)
Gaussian Process Regression Networks pdf code
- David A. Knowles and Tom Minka (2011)
Non-conjugate Variational Message Passing for Multinomial and Binary Regression pdf
Supplementary material: pdf
- Mehregan Movassagh, Mun-Kit Choy, David A. Knowles, Lina Cordeddu, Syed Haider, Thomas Down, Lee Siggens, Ana Vujic, Ilenia Simeoni, Chris Penkett, Martin Goddard, Pietro Lio, Martin Bennett, Roger Foo (2011)
Distinct epigenomic features in human cardiomyopathy url
Circulation, American Heart Association
- David A. Knowles and Zoubin Ghahramani (2011)
Pitman-Yor Diffusion Trees pdf slides
Supplementary material: Sampling from the PYDT
27th Conference on Uncertainty in Artificial Intelligence (UAI)
- David A. Knowles, Jurgen Van Gael, and Zoubin Ghahramani (2011)
Message Passing Algorithms for the Dirichlet Diffusion Tree pdf talk
Extended version: pdf
- Cornelia Schone, Anne Venner, David A. Knowles, Mahesh M Karnani, Denis Burdakov (2011)
Dichotomous cellular properties of mouse orexin/hypocretin neurons url
The Journal of Physiology
- David A. Knowles and Zoubin Ghahramani (2011)
Nonparametric Bayesian Sparse Factor Models with application to Gene Expression modelling pdf url code
Annals of Applied Statistics
- Daniel Glass, Leopold Parts, David A. Knowles, Abraham Aviv, and Tim D. Spector (2010)
No Correlation Between Childhood Maltreatment and Telomere Length url
In Biol Psychiatry. 2010 September 15; 68(6): Pages 21-22
- Finale Doshi*, David A. Knowles*, Shakir Mohamed* and Zoubin Ghahramani (2009)
Large Scale Non-parametric Inference: Data Parallelisation in the Indian Buffet Process. pdf
In NIPS 2009, 7-12 December 2009, Vancouver, BC, Canada.
- Knowles, D. A. and Ghahramani, Z. (2007)
Infinite Sparse Factor Analysis and Infinite Independent Components Analysis. pdf
In 7th International Conference on Independent Component Analysis and Signal Separation (ICA 2007).
Lecture Notes in Computer Science Series (LNCS). Springer.
*joint first author.
Workshop papers/conference abstracts
- Yang Li, Bryce van de Geijn, Allegra Petti, Anil Raj, David A. Knowles, John Blischak, Yoav Gilad, Jonathan Pritchard (2015).
The effects of human genetic variation on the gene regulatory cascade.
American Society of Human Genetics 65th Annual Meeting
- David A. Knowles, Stanley Ho, Kien Nguyen, Don Morris, Anthony Magliocco, Anindya Sarkar, Daphne Koller, Sylvia Plevritis, Srinivas Chukka, Michael Barnes (2015).
Machine Learning-based Prognostication of Breast Cancer Recurrence using Tissue Slide Features.
Pathology Visions Winner: Best Poster in Image Analysis!
- David A. Knowles, Joe R. Davis, Stephen B. Montgomery, Alexis Battle (2015).
Detecting gene-by-environment interactions using allele specific expression.
The Biology of Genomes Meeting (CSHL)
- David A. Knowles, Stanley Ho, Kien Nguyen, Don Morris, Anthony Magliocco, Anindya Sarkar, Daphne Koller, Srinivas Chukka, Michael Barnes (2014)
Machine learning-based prognostication of breast cancer recurrence using tissue slide features from H&E and immunohistochemically stained slides.
San Antonio Breast Cancer Symposium
- Emily K. Tsang, Xin Li, Vanessa Anaya, Konrad J. Karczewski, David A. Knowles, Kevin S. Smith, Stepehn B. Montgomery (2014).
Dissecting the genetic regulation of exosome RNA cargo in a large family.
American Society of Human Genetics 64th Annual Meeting
- J.R. Davis, D.A. Knowles, S.B. Montgomery, A. Battle (2014)
Rare variation and the genomic context of allele-specific expression. American Society of Human Genetics 64th Annual Meeting
- David A. Knowles, Alexis Battle, Daphne Koller (2013)
Discovering latent cancer characteristics predictive of drug sensitivity.
RECOMB/ISCB Conference on Regulatory & Systems Genomics (selected for oral presentation)
- Alexis Battle*, David A. Knowles*, Sara Mostafavi, Xiaowei Zhu, James B. Potash, Myrna M. Weissman, Courtney McCormick, Christian D. Haudenschild, Kenneth B. Beckman, Jianxin Shi, Rui Mei, Alexander E. Urban, Douglas F. Levinson, Daphne Koller, Stephen B. Montgomery (2013)
The relationship between common environmental and genetic effects on human gene splicing and expression.
American Society of Human Genetics (ASHG) Annual Meeting
- David A. Knowles, Leopold Parts, Daniel Glass and John M. Winn
Inferring a measure of physiological age from multiple ageing related phenotypes. paper video
To appear at the NIPS workshop: From Statistical Genetics to Predictive Models in Personalized Medicine (NIPS PM 2011)
- David A. Knowles, Leopold Parts, Daniel Glass and John M. Winn (2010)
Modeling skin and ageing phenotypes using latent variable models in Infer.NET. paper poster
Poster presented at: Predictive Models in Personalized Medicine Workshop, NIPS 2010, 6-11 December 2010, Vancouver, BC, Canada.
- Knowles, D. and Holmes, S. (2009)
Statistical tools for ultra-deep pyrosequencing of fast evolving viruses. pdf video slides
Presented at: Computational Biology Workshop, NIPS 2009, 7-12 December 2009, Vancouver, BC, Canada.
- Bayesian non-parametric models and inference for sparse and hierarchical latent structure (2012) pdf
PhD Thesis, University of Cambridge
Supervisor: Zoubin Ghahramani
- Serial and Parallel Inference in Sparse Nonparametric Latent Factor Models applied to Gene Expression Modeling (2009) pdf
PhD First Year Report, Department of Engineering, University of Cambridge
Supervisor: Zoubin Ghahramani
- Statistical tools for ulta-deep pyrosequencing of fast evolving viruses (2008) pdf
MSc Bioinformatics and Systems Biology, Imperial College London, Individual Project
Supervisor: Professor Susan Holmes, Stanford University
- SBML-ABC: a package for data simulation, parameter inference and model selection, Group Report (2008) pdf
MSc Bioinformatics and Systems Biology, Imperial College London, Group Project
Supervisor: Professor Michael Stumpf
- Infinite Independent Components Analysis (2007) pdf
MEng Information Engineering, Cambridge University, 4th year project
Supervisor: Professor Zoubin Ghahramani
- Real Time Continuous Curvature Path Planner for an Autonomous Vehicle in an Urban Environment (2006) pdf
Summer Undergraduate Research Fellowship, Caltech. I was a member of Team Caltech, an entry into the DARPA Urban Challenge
Supervisor: Professor Richard Murray
- Detecting gene-by-environment interactions using allele specific expression. The Biology of Genomes 2015 (image credit @AlexCagan)
- Properties of Bayesian nonparametric models and priors over trees. Guest lecture as part of Matt Hoffman's STAT300 class, summer 2013.
- Diffusion trees as priors. This was a talk I gave about the Dirichlet diffusion tree and Pitman Yor diffusion tree at Collegio Carlo Alberto.
- Inferring an individual's "physiological" age from multiple ageing-related phenotypes
I gave a talk at the Cambridge Statistics Initiative Special One-Day Meeting, which you can watch here. I also presented this work at the NIPS 2011 Personalised Medicine workshop: paper video
- Variational methods for nonparametric Bayesian models
I gave a brief presentation at Microsoft Research summarising some attempts to use variational inference in nonparametric, particularly Dirichlet Process based, models. The slides are here.
The C#/Infer.NET code for Gaussian Process Regression Networks is on github.
The Matlab code for nonparametric sparse factor analysis is available here.
Please note that this is research code and as such is provided with no warranty and limited to no support.
Lagrangian dualityMy last journal club was on convex optimisation. I think I finally got my head round Lagrangian duality, and hopefully came up with a reasonably intuitive explanation. My focus is on intuition rather than rigor, and is based almost entirely on Boyd and Vandenberghe's tome. I thought I should write this up while it's still fresh, so here you go:
Binomial p-valuesFollowing my work on 454 pyrosequencing error rates with Professor Holmes, I was asked about how to calculate a p-value for comparing two draws from a Binomial distribution to test the hypothesis that the number of substitutions seen in the sample is significantly greater than the number of substitutions seen in the control. There is actually no need to use the Poisson approximation, and the Binomial distribution very naturally takes care of varying coverage. I explain my approach here.
Emacs for DummiesA few notes on using Emacs for Unix non-gurus.
Contact detailsE-mail: My surname followed by 84 at gmail.com
Physical location: Office 133, Gates Building
Snail mail: Computer Science Department, 353 Serra Mall, Stanford University, Stanford, CA 94305-9025, USA
You can view my availability here.