Co-reference annotation, provided for BOLT by Raytheon BBN Technologies, captures the part of human language interpretation that links definite references in the text to the respective entities in the discourse. Annotators link together names, pronouns, and definite descriptions that refer to the same entity, providing crucial information for systems performing semantic interpretation. Noun phrase mentions of events are also linked to verb phrases that describe the event. The annotation is built directly on the Treebank parse, allowing co-reference information to be combined easily with the syntactic and propositional analysis. Null pronouns found in Chinese are included in the co-reference annotation, and when speaker turn information is available, speaker names are also included.
The coreference annotation builds on Treebank annotation, since the noun phrases in the parse trees become candidates for coreference linking. Co-reference makes use of various other layers of information including sentence units, so that the parse tree components can be linked back to the original data files and to any speaker identification information available there.
Information from these various sources is combined into a merged file, then into the “Callisto” annotation workbench, a tool available from MITRE (but no longer supported). Using Callisto, annotators mark the coreferences between noun and verb phrases, following the BOLT coreference annotation guidelines.
The annotated version of the Callisto file is then used to produce a coref format output file, with SGML tags surrounding the words of each coreferenced substring and with ID numbers in the coref tags that link the different coreferenced mentions.