Red border = wrong retrieval
Green border = correct retrieval
Yellow border = true image, not retrieved in top 5
Click on an image to see inferred grounding