Overview

This page is complementary to the "Using Query Patterns to Learn the Duration of Events" paper and contains the information about dataset splits and generated durations lexicon.

Train/Test Split

The table below outlines the files that were used for dataset splits


Files
Training "APW19980213.1310.tmldur.xml", "NYT19980424.0421.tmldur.xml", "APW19980227.0468.tmldur.xml", "CNN19980223.1130.0960.tmldur.xml", "APW19980308.0201.tmldur.xml", "CNN19980222.1130.0084.tmldur.xml", "CNN19980126.1600.1104.tmldur.xml", "ABC19980120.1830.0957.tmldur.xml", "VOA19980305.1800.2603.tmldur.xml", "APW19980301.0720.tmldur.xml", "APW19980526.1320.tmldur.xml", "PRI19980303.2000.2550.tmldur.xml", "APW19980227.0476.tmldur.xml", "APW19980306.1001.tmldur.xml", "APW19980213.1320.tmldur.xml", "ea980120.1830.0456.tmldur.xml", "PRI19980115.2000.0186.tmldur.xml", "PRI19980121.2000.2591.tmldur.xml", "PRI19980205.2000.1998.tmldur.xml", "ed980111.1130.0089.tmldur.xml", "APW19980219.0476.tmldur.xml", "APW19980213.1380.tmldur.xml", "PRI19980306.2000.1675.tmldur.xml", "APW19980418.0210.tmldur.xml", "NYT19980206.0466.tmldur.xml", "VOA19980331.1700.1533.tmldur.xml", "VOA19980501.1800.0355.tmldur.xml", "APW19980626.0364.tmldur.xml", "CNN19980227.2130.0067.tmldur.xml", "ea980120.1830.0071.tmldur.xml", "AP900816-0139.tmldur.xml", "VOA19980303.1600.2745.tmldur.xml", "APW19980322.0749.tmldur.xml", "CNN19980213.2130.0155.tmldur.xml", "ABC19980108.1830.0711.tmldur.xml", "PRI19980205.2000.1890.tmldur.xml", "APW19980227.0489.tmldur.xml", "PRI19980213.2000.0313.tmldur.xml"
Test "ABC19980114.1830.0611.tmldur.xml", "ABC19980304.1830.1636.tmldur.xml", "APW19980227.0494.tmldur.xml", "APW19980501.0480.tmldur.xml", "NYT19980206.0460.tmldur.xml", "NYT19980212.0019.tmldur.xml", "NYT19980402.0453.tmldur.xml", "PRI19980216.2000.0170.tmldur.xml", "SJMN91-06338157.tmldur.xml", "VOA19980303.1600.0917.tmldur.xml"
WSJ Test "wsj_0006.tmldur.xml", "wsj_0026.tmldur.xml", "wsj_1025.tmldur.xml", "wsj_1031.tmldur.xml", "wsj_1035.tmldur.xml", "wsj_1038.tmldur.xml", "wsj_1039.tmldur.xml", "wsj_1040.tmldur.xml", "wsj_1042.tmldur.xml", "wsj_1073.tmldur.xml"

Lexicon Format

The lexicon format is a text file, intended to be both human readable as well easily parsable. Each line in lexicon represents a single event with its duration distribution. The verb-object event combinations are sorted by frequency of occurrence in NYT portion of Gigaword with the 10 most frequent grammatical objects for each verb.
Line example:

EVENT=to be,ID=e1-1,OBJ=it,PATTERNS=2,DISTR=[0.290;0.105;0.182;0.100;0.059;0.098;0.106;0.060;]

Lexicon Download

Event Durations Lexicon (658KB)

Acknowledgements

This research draws on data provided by Yahoo!, Inc., through its Yahoo! Search Services offering. This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09- C-0181. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of DARPA, AFRL, or the US government.

References

Using Query Patterns to Learn the Duration of Events
Andrey Gusev, Nathanael Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, Dan Jurafsky
IWCS-11, Oxford. 2011.