For every test set sentence below we retrieve the top images (from set of 1000). Yellow number top left of each image = score. Clicking on each image reveals the precise inferred grounding. Red border = incorrect retrieval, green border = correct retrieval. Yellow border = ground truth image that wasn't retrieved among top 5 predictions.