A Quick Tour of Deep Learning History

I read a bunch of papers and articles, and then wrote these cliffs notes so that you don't have too. I'll start off with some pictures:
Mountain View

Google trends results for "deep learning".
Mountain View

State-of-the-art in speech recogition. Take a look at that precipitous drop after a decade of stagnation.
Mountain View

State-of-the-art in image recognition. See how the circled bar cut error almost in half?

All this is deep learning. Lots of it is hype but lots of it is also very cool.

The theory behind deep learning has roots as old as the 1970’s and earlier [1]. From the earliest days of pattern recognition, one goal of machine learning researchers has been to automate the process of structuring data into features. The deep learning architectures that were designed for this task were artificial neural nets, and the method to train these networks was discovered independently by several groups during the 70’s and 80’s [5].

In 1980, Kunihiko Fukushima drew inspiration from the visual cortex of the human brain to invent a precursor of the Convolutional Neural Network. He used this network to recognize photos of simple objects like numbers [2]. In 1989, Yann LeCun used a deep feedforward neural network to read handwritten zip codes [3]. That same year, the ALVINN system used a three layer neural network to train computer-controlled vehicles to steer correctly for 90 miles on a public interstate highway [4].

Despite this promising start, deep learning was “largely forsaken by the machine learning community and ignored by the computer-vision and speech-recognition communities' in the 90’s and early 2000’s [5]. There was one simple reason for this: neural networks were prohibitively slow to train. For example, it took three days to train the zip-code reading network of LeCun [3].

Interest in deep learning was gradually revived in the 21st century. In 2007, Geoffrey Hinton and others at the Canadian Institute for Advanced Research (CIFAR) developed a method to pre-train networks [6]. This pre-training technique took advantage of the advent of fast, conveniently programmable graphics processing units (GPUs) and allowed researchers to train networks 10 to 20 times faster than before [7]. This speedup meant that models could be taught how to do things in hours or days instead of weeks or months.

The academic and industrial machine learning community soon took notice. They used a variety of architectures to produce record-breaking results in many artificial intelligence tasks. In 2009, deep neural networks produced record results in speech recognition and vocabulary recognition [9]. By 2012, this architecture was being deployed in many Android phones [5]. During ImageNet 2012, an annual AI competition in which teams compete to produce the best image recognizer, a deep convolutional network called AlexNet produced spectacular results, nearly halving the error rates of their closest competitors [10]. Since then, deep learning has entered a renaissance as industry and research leaders have increasingly adopted deep learning techniques with impressive results [8].

Reading List

Mitchell, Tom M. M achine Learning. New York: McGraw-Hill, 1997. Print.
Fukushima, Kunihiko. “Neocognitron: A Self-Organizing Neural Network Model For A Mechanism Of Pattern Recognition Unaffected By Shift In Position”. Biol. Cybernetics 36.4 (1980): 193-202. Web.
LeCun, Y. et al. “Backpropagation Applied To Handwritten Zip Code Recognition”. Neural Computation 1.4 (1989): 541-551. Web.
Pomerleau, Dean. “ALVINN, An Autonomous Land Vehicle In A Neural Network”. Carnegie − Mellon University Artificial Intelligence And Psychology Project (1989): n. pag. Print.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. “Deep Learning”. Nature 521.7553 (2015): 436-444. Web.
Hinton, Geoffrey. “Learning Multiple Layers Of Representation”. Trends in Cognitive Sciences 11.10 (2007): n. pag. Print.
Raina, Rajat, Anand Madhavan, and Andrew Ng. “Large Scale Deep Unsupervised Learning Using Graphics Processors”. Proceedings of the 26th International Conference on M achine Learning (2009): n. pag. Print.
ibm.com,. “IBM Launches Industry’s First Cognitive Consulting Practice”. N.p., 2015. Web. 15 Dec. 2015.
Dahl, George, and Dong Yu. “Context-Dependent Pre-Trained Deep Neural Networks For Large-Vocabulary Speech Recognition”. IEEE Transactions on Audio, Speech, and Language Proccessing 20.1(2011): n. pag. Print.