Visual Reasoning in the Real World

A New Dataset for Visual Question Answering

Drew Hudson & Christopher Manning

The GQA Dataset

Question Answering on Image Scene Graphs

Semantic Representations

Each image comes with a scene graph of objects and relations. Each question comes with a structured representation of its semantics.

Compositional

22M multi-step questions that require a diverse set of reasoning skills, with both binary and open questions.

Balanced

The answer distribution biases are reduced for each question type to mitigate language priors and prevent educated guesses.

Strong Supervision

The structured representations allow for a stronger and more informative error signal during training.

New Metrics

A suite of new metrics to evaluate not only accuracy, but also the consistency, validity and plausibility of responses.

Thorough Diagnosis

Supports careful analysis based on question and answer type, length, number of reasoning steps and difficulty.

Join the 2020 GQA Challenge for Real-World Visual Reasoning

Join the Challenge!

What is the woman to the right of the boat holding? umbrella
Are there any men to the left of the person that is holding the umbrella? no
What color is the object the woman is holding? purple
On which side of the photo is the woman, the right or the left? right
Is the jacket blue? no

Is the tray on top of the table black or light brown? light brown
Are the napkin and the cup the same color? yes
Is the small table both oval and wooden? yes
Is the syrup to the left of the napkin? yes
Is there any fruit to the left of the tray the cup is on top of? yes
Are there any cups to the left of the tray that is on top of the table? no
Could this room be a living room? yes

Which side of the image is the plate on? right
Are there any lamps on the desk to the right of the rug? yes
What type of furniture are the flowers on, a bed or a table? table
Are there any clocks or mirrors? no
Are there any chairs to the right of the lamp on the table? yes
What is the dark piece of furniture to the right of the rug called? cabinet

What color are the skis? black
Do the gloves and helmet have the same color? yes
Are these skis short or long? long
On which side of the photo is the man? right
Are there both women and men in the image? no
What is on the tree to the left of the person? snow

Is there a door or a window that is open? no
Do you see any white numbers or letters? yes
What is the large container made of? cardboard
What animal is in the box? bear
Is there a bag right of the bear? no
Is there a box inside the plastic bag? no
What is the green thing on the right? door
What color is the bear? brown

What is common to the pajamas and the trousers? color
What is the happy woman wearing? pajamas
What is the name of the device behind the man? television
What kind of device is the woman holding, a remote control or a cell phone? remote control
Do the controllers and the socks have the same color? yes
Who is holding the device in the center? woman

Which kind of watercraft is on the beach? boat
Is there any ice? no
Is the boat made of metal or wood? wood
Is it an indoors or outdoors picture? outdoors
Is this a zoo or a beach? beach
Are there either an umbrella or a surfboard in the scene? no
Does the beach near the ocean look sandy? yes

GQA images are from COCO and Flickr. The image scene graphs are based on a
new cleaner version of Visual Genome. We thank COCO, Flickr, and Visual
Genome teams for their great work!

Read the Paper!

Read the Paper!

@article{hudson2018gqa,
    title={GQA: A New Dataset for Real-World Visual Reasoning 
    and Compositional Question Answering},
    author={Hudson, Drew A and Manning, Christopher D},
    journal={Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2019}
}