David A. Knowles, Ph.D.

  • Postdoctoral Researcher
  • Departments of Genetics and Radiology

Machine learning, genomics and computational biology

I am a post-doctoral researcher at Stanford University with Sylvia Plevritis (Center for Computational Cystems Biology/Radiology) and Jonathan Pritchard (Genetics) having previously worked with Daphne Koller prior to her move to Coursera. I did my PhD with Zoubin Ghahramani in the Machine Learning group of the Cambridge University Engineering Department. I was the Roger Needham Scholar at Wolfson College, funded by Microsoft Research. My undergraduate degree comprised two years of Physics before switching to Engineering to complete an MEng with Zoubin. I took the MSc Bioinformatics and Systems Biology at Imperial College in 2007/8.

My research involves both the development of novel machine learning methods and their application to data analysis problems in biology. I am collaborating with Ventana-Roche on automated breast cancer prognosis from digitised histological and immunohistochemical slides.

During my PhD I was a part-time developer of Infer.NET. I wrote a blog post about some of the features we added in Infer.NET 2.4, see here.

I was involved with running the Cambridge University Statistics Clinic. At Stanford I help out with SMACC: Statistical, Mathematical, and Computational Consulting.

You can download my CV here. Here's a video of me and my friend Johan falling off cliffs on skis in Flaine, France.


*joint first author.

Workshop papers/conference abstracts




The C#/Infer.NET code for Gaussian Process Regression Networks is on github.

The Matlab code for nonparametric sparse factor analysis is available here.

The MCMC sampler for the Dirichlet Process Variable Clustering model is available on Google code at code.google.com/p/dpvc/

Please note that this is research code and as such is provided with no warranty and limited to no support.


Lagrangian duality

My last journal club was on convex optimisation. I think I finally got my head round Lagrangian duality, and hopefully came up with a reasonably intuitive explanation. My focus is on intuition rather than rigor, and is based almost entirely on Boyd and Vandenberghe's tome. I thought I should write this up while it's still fresh, so here you go:

Lagrangian Duality for Dummies

Binomial p-values

Following my work on 454 pyrosequencing error rates with Professor Holmes, I was asked about how to calculate a p-value for comparing two draws from a Binomial distribution to test the hypothesis that the number of substitutions seen in the sample is significantly greater than the number of substitutions seen in the control. There is actually no need to use the Poisson approximation, and the Binomial distribution very naturally takes care of varying coverage. I explain my approach here.

Emacs for Dummies

A few notes on using Emacs for Unix non-gurus.

Contact details

E-mail: My surname followed by 84 at gmail.com