Each image passes through these four phases in order. In the first phase, a single worker writes phrases describing the image and draws bounding boxes around objects mentioned by the phrases. In the subsequent three phases, other workers verify the correctness of these phrases. After the image has made a complete pass through all four phases, the process iterates: the image returns to the beginning, and another worker writes more phrases about the image and possibly adds new objects to the image, which are corrected and verified in the other phases. In practice each image makes three complete passes through the four phases.
Having multiple workers write phrases about each image means that our annotations have greater variety and recall. Having these workers annotate the image in serial rather than in parallel eliminates the need to consolidate duplicate annotations between workers, as each worker builds upon the annotations of the previous worker rather than starting from scratch.