Visual Reasoning in the Real World

A New Dataset for Visual Question Answering

Drew Hudson & Christopher Manning

The GQA Dataset

Question Answering on Image Scene Graphs

Semantic Representations

Each image comes with a scene graph of objects and relations. Each question comes with a structured representation of its semantics.

Compositional

22M multi-step questions that require a diverse set of reasoning skills, with both binary and open questions.

Balanced

The answer distribution biases are reduced for each question type to reduce language priors and prevent educated guesses.

Strong Supervision

The structured representations allow for a stronger and more informative error signal during training.

New Metrics

A suite of new metrics to measure not only accuracy, but also the consistency, validity and plausibility of responses.

Thorough Diagnosis

Supports careful diagnosis based on question and answer type, length, number of reasoning steps and difficulty.

Join the 2019 GQA Challenge for Real-World Visual Reasoning

GQA images are from COCO and Flickr. The image scene graphs are based on a
new cleaner version of Visual Genome. We thank COCO, Flickr, and Visual
Genome teams for their great work!

Paper coming soon!
Paper coming soon!
@article{hudson2018gqa,
    title={GQA: a new dataset for compositional question answering 
    over real-world images},
    author={Hudson, Drew A and Manning, Christopher D},
    year={2018}
}