STL-10 dataset


The STL-10 dataset is an image recognition dataset for developing unsupervised feature learning, deep learning, self-taught learning algorithms. It is inspired by the CIFAR-10 dataset but with some modifications. In particular, each class has fewer labeled training examples than in CIFAR-10, but a very large set of unlabeled examples is provided to learn image models prior to supervised training. The primary challenge is to make use of the unlabeled data (which comes from a similar but different distribution from the labeled data) to build a useful prior. We also expect that the higher resolution of this dataset (96x96) will make it a challenging benchmark for developing more scalable unsupervised learning methods.

Overview

Testing Protocol

We recommend the following standardized testing protocol for reporting results:

Download

Reference

* Please cite the following reference in papers using this dataset:

Adam Coates, Honglak Lee, Andrew Y. Ng An Analysis of Single Layer Networks in Unsupervised Feature Learning AISTATS, 2011. (PDF)

* Please use http://cs.stanford.edu/~acoates/stl10 as the URL for this site when necessary.

Contact

Send questions to Adam Coates: