Download
Scene Graphs
ver 1.1 / 42.7MB
Download
Questions
ver 1.12 / 1.4GB
Download
Images
Name Type Description
scene_graphs.json dict A dictionary from each imageId to its Scene Graph.
sceneGraph dict A dictionary holding information about the scene and image.
width int The image width in pixels.
height int The image height in pixels.
location str Optional. The location of the image, e.g. kitchen, beach.
weather str Optional. The weather in the image, e.g. sunny, cloudy.
objects dict A dictionary from objectId to its object.
   object dict A visual element in the image (node).
      name str The name of the object, e.g. person, apple or sky.
      x int Horizontal position of the object bounding box (top left).
      y int Vertical position of the object bounding box (top left).
      w int The object bounding box width in pixels.
      h int The object bounding box height in pixels.
      attributes [str] A list of all the attributes of the object, e.g. blue, small, running.
      relations [dict] A list of all outgoing relations (edges) from the object (source).
         relation dict A triple representing the relation between source and target objects.
            name str The relation predicate, e.g. holding, on, left of.
            object str The objectId of the relation target.
Scene_graphs.json
{ "2407890": { "width": 640, "height": 480, "location": "living room", "weather": none, "objects": { "271881": { "name": "chair", "x": 220, "y": 310, "w": 50, "h": 80, "attributes": ["brown", "wooden", "small"], "relations": { "32452": { "name": "on", "object": "275312" }, "32452": { "name": "near", "object": "279472" } } } } } }
Name Type Description
questions.json dict A dictionary from each questionId (str) to a question.
question dict A dictionary holding information about the question.
imageId str For both train/test. The image the question is based on.
question str For both train/test. The question string.
answer str The answer of the question (short version).
fullAnswer str The full answer to the question (sentence-like, long version).
isBalanced boolean For both train/test. True if in the balanced dataset version.
entailed [str] A list of questionIds of entailed questions.
equivalent [str] A list of questionIds of equivalent questions.
groups dict The question group codes. Used for balancing the dataset.
   global str The question global group code (e.g. all color questions).
   local str The question local group code (e.g. all questions about apple's colors).
types dict A dictionary holding information about the question types.
   structural str The question structural type, e.g. query (open), verify (yes/no).
   semantic str The question subject's type, e.g. 'attribute' for questions about color or material.
   detailed str The question complete type specification, out of 20+ subtypes, e.g. twoSame.
annotations dict Object annotations for question and answer.
   question dict Visual pointer from question words (e.g. slice "2:4", key) to object (objectId, value).
   answer dict Visual pointer from answer word (e.g. index "0", key) to object (objectId, value).
   fullAnswer dict Visual pointer from answer words (e.g: "0", "2:4", key) to object (objectId, value).
semantic [dict] A list of reasoning steps needed to answer the question.
   semantic[i] dict Each reasoning step.
      operation str The reasoning operation. e.g. select, filter, relate.
      argument str The operation argument(s). Depends on the specific operation. Usually objectId.
      dependencies [int] Optional. Prior steps the current one depends on.
semanticStr str String form of the question's semantic structure.
Questions.json
{ "1238592": { "imageId": "2407890", "question": "Is there a red apple on the table?", "answer": "no", "fullAnswer": "No, there is an apple but it is green.", "isBalanced": true, "group": "8r-binary-apple", "entailed": ["1352631", "1245832", "842753"], "equivalent": ["1245832", "842753"], "types": { "structural": "verify", "semantic": "relation", "detailed": "existAttrRel" }, "annotations": { "question": {"4": "271881", "7": "279472"}, "answer": {}, "fullAnswer": {"4": "271881"} }, "semantic": [ { "operation": "select", "argument": "table (279472)", }, { "operation": "relate", "argument": "on, subject, apple (271881)", }, { "operation": "filter", "argument": "red", }, { "operation": "exist", "argument": "?", } ], "semanticStr": "select: table (279472) -> relate: on, subject, apple (271881) -> exist: ?" } }
Name Type Description
Images directory A directory of all images, in format: {ID}.jpg.
gqa_spatial.h5 hdf5 Spatial features for GQA, extracted from Resnet-101. hdf5 format.
   features float The features: (ImagesNum, 2048, 7, 7)
gqa_sptail_info.json dict Mapping between image ID (key) and info about it.
   idx int The image's index in the h5 file.
gqa_objects.h5 hdf5 Object-based features for GQA, extracted from faster-RCNN. hdf5 format
   features float The features for up to 100 objects: (ImagesNum, 100, 2048).
   bboxes float The objects' bounding boxes in pixels: (ImagesNum, 100, 4).
gqa_objects_info.json dict Mapping between image ID (key) and info about it.
   idx int The image's index in the h5 file.
   objectsNum int The number of objects in the image.
   width int The image's width in pixels.
   height int The image's height in pixels.