GQA: Visual Reasoning in the Real World

ver 1.1 / 42.7MB

ver 1.2 / 1.4GB

*If the download doesn't start for you, right-click on the requested button, then click Copy Link Address and paste it in a new tab.

Scene_graphs.json
Questions.json
Images.zip

Name	Type	Description
scene_graphs.json	dict	A dictionary from each imageId to its Scene Graph.
sceneGraph	dict	A dictionary holding information about the scene and image.
width	int	The image width in pixels.
height	int	The image height in pixels.
location	str	Optional. The location of the image, e.g. kitchen, beach.
weather	str	Optional. The weather in the image, e.g. sunny, cloudy.
objects	dict	A dictionary from objectId to its object.
object	dict	A visual element in the image (node).
name	str	The name of the object, e.g. person, apple or sky.
x	int	Horizontal position of the object bounding box (top left).
y	int	Vertical position of the object bounding box (top left).
w	int	The object bounding box width in pixels.
h	int	The object bounding box height in pixels.
attributes	[str]	A list of all the attributes of the object, e.g. blue, small, running.
relations	[dict]	A list of all outgoing relations (edges) from the object (source).
relation	dict	A triple representing the relation between source and target objects.
name	str	The relation predicate, e.g. holding, on, left of.
object	str	The objectId of the relation target.

Scene_graphs.json
{
    "2407890": {
        "width": 640,
        "height": 480,
        "location": "living room",
        "weather": none,
        "objects": {
            "271881": {
                "name": "chair",
                "x": 220,
                "y": 310,
                "w": 50,
                "h": 80,
                "attributes": ["brown", "wooden", "small"],
                "relations": {
                    "32452": {
                        "name": "on",
                        "object": "275312"
                    },
                    "32452": {
                        "name": "near",
                        "object": "279472"
                    }                    
                }
            }
        }
    }
}

Name	Type	Description
questions.json	dict	A dictionary from each questionId (str) to a question.
question	dict	A dictionary holding information about the question.
imageId	str	For both train/test. The image the question is based on.
question	str	For both train/test. The question string.
answer	str	The answer of the question (short version).
fullAnswer	str	The full answer to the question (sentence-like, long version).
isBalanced	boolean	For both train/test. True if in the balanced dataset version.
entailed	[str]	A list of questionIds of entailed questions.
equivalent	[str]	A list of questionIds of equivalent questions.
groups	dict	The question group codes. Used for balancing the dataset.
global	str	The question global group code (e.g. all color questions).
local	str	The question local group code (e.g. all questions about apple's colors).
types	dict	A dictionary holding information about the question types.
structural	str	The question structural type, e.g. query (open), verify (yes/no).
semantic	str	The question subject's type, e.g. 'attribute' for questions about color or material.
detailed	str	The question complete type specification, out of 20+ subtypes, e.g. twoSame.
annotations	dict	Object annotations for question and answer.
question	dict	Visual pointer from question words (e.g. slice "2:4", key) to object (objectId, value).
answer	dict	Visual pointer from answer word (e.g. index "0", key) to object (objectId, value).
fullAnswer	dict	Visual pointer from answer words (e.g: "0", "2:4", key) to object (objectId, value).
semantic	[dict]	A list of reasoning steps needed to answer the question.
semantic[i]	dict	Each reasoning step.
operation	str	The reasoning operation. e.g. select, filter, relate.
argument	str	The operation argument(s). Depends on the specific operation. Usually objectId.
dependencies	[int]	Optional. Prior steps the current one depends on.
semanticStr	str	String form of the question's semantic structure.

Questions.json
{
    "1238592": {
        "imageId": "2407890",
        "question": "Is there a red apple on the table?",
        "answer": "no",
        "fullAnswer": "No, there is an apple but it is green.",
        "isBalanced": true,
        "group": "8r-binary-apple",   
        "entailed": ["1352631", "1245832", "842753"],
        "equivalent": ["1245832", "842753"],             
        "types": {
            "structural": "verify",
            "semantic": "relation",
            "detailed": "existAttrRel"
        },
        "annotations": {
            "question": {"4": "271881", "7": "279472"},
            "answer": {},
            "fullAnswer": {"4": "271881"}
        },
        "semantic": [
            {
                "operation": "select",
                "argument": "table (279472)",
            },
            {
                "operation": "relate",
                "argument": "on, subject, apple (271881)",
            },
            {
                "operation": "filter",
                "argument": "red",
            },
            {
                "operation": "exist",
                "argument": "?",
            }                             
        ],
        "semanticStr": "select: table (279472) -> relate: on, subject, apple (271881) -> exist: ?"
    }
}

Name	Type	Description
Images	directory	A directory of all images, in format: {ID}.jpg.
gqa_spatial.h5	hdf5	Spatial features for GQA, extracted from Resnet-101. hdf5 format.
features	float	The features: (ImagesNum, 2048, 7, 7)
gqa_sptail_info.json	dict	Mapping between image ID (key) and info about it.
idx	int	The image's index in the h5 file.
gqa_objects.h5	hdf5	Object-based features for GQA, extracted from faster-RCNN. hdf5 format
features	float	The features for up to 100 objects: (ImagesNum, 100, 2048).
bboxes	float	The objects' bounding boxes in pixels: (ImagesNum, 100, 4).
gqa_objects_info.json	dict	Mapping between image ID (key) and info about it.
idx	int	The image's index in the h5 file.
objectsNum	int	The number of objects in the image.
width	int	The image's width in pixels.
height	int	The image's height in pixels.