BOLT Annotation

LDC and its data partners Brandeis University, Columbia University, Raytheon BBN Technologies and University of Colorado provided up to four additional layers of annotation for portions of source and translated data: Word Alignment, Treebank, PropBank, and Co-reference. Word Alignment captures translational correspondence between parallel sentences, resulting in links between individual words, phrases and groups. Treebanks are fully parsed corpora manually annotated for syntactic structure at the sentence level and for part-of-speech or morphological information at the token level. PropBank creates a corpus of text annotated with information about basic semantic propositions. Co-reference annotation captures the part of human language interpretation that links definite references in the text to the respective entities in discourse. Treebank is the bottom-level annotation upon which PropBank, Word Alignment and Co-reference are performed. 

Annotation Tasks

Word Alignment