2008 CoNLL Shared Task Data, Linguistic
Data Consortium (LDC) catalog number LDC2009T12 and isbn 1-58563-505-7,
contains the the trial corpus, training corpus, development and test data for
the 2008 CoNLL (Conference on Computational
Natural Language Learning) Shared Task Evaluation. The 2008 Shared Task
developed syntactic dependency annotations, including information such as named-entity
boundaries and the semantic dependencies model roles of both verbal and nominal
predicates. The materials in the Shared Task data consist of excerpts from the
following corpora: Treebank-3
LDC99T42, BBN
Pronoun Coreference and Entity Type Corpus LDC2005T33, Proposition
Bank I LDC2004T14 (PropBank) and NomBank
v 1.0 LDC2008T23.
The Conference on Computational
Natural Language Learning (CoNLL) is accompanied every year by a shared task
intended to promote natural language processing applications and evaluate them
in a standard setting. The 2004 and 2005 CoNLL shared tasks were dedicated to
semantic role labeling (SRL) in a monolingual setting (English). In 2006 and
2007, the shared tasks were devoted to the parsing of syntactic dependencies
and used corpora from up to thirteen languages. The 2008 shared task employed
a unified dependency-based formalism and merged the task of syntactic dependency
parsing and the task of identifying semantic arguments and labeling them with
semantic roles.
The 2008 shared task was divided into three subtasks:
- parsing syntactic dependencies
- identification and disambiguation of semantic predicates
- identification of arguments and assignment of semantic roles for each predicate
Several objectives were addressed in this shared task:
- SRL was performed and evaluated using a dependency-based
representation for both syntactic and semantic dependencies. While SRL on
top of a dependency treebank has been addressed before, the approach of the
2008 Shared Task was characterized by the following novelties:
- The constituent-to-dependency conversion strategy transformed all annotated
semantic arguments in PropBank and NomBank v 1.0,
not just a subset;
- The annotations addressed propositions centered around both verbal (PropBank)
and nominal (NomBank) predicates.
- Based on the observation that a richer set of syntactic dependencies
improves semantic processing, the syntactic dependencies modeled are more
complex than the ones used in the previous CoNLL shared tasks. For example,
the corpus includes apposition links, dependencies derived from named entity
(NE) structures, and better modeling of long-distance grammatical relations.
- A practical framework is provided for the joint learning of syntactic and
semantic dependencies.
Due to the complexity of the 2008 shared task, only a single language, English,
was used.
Samples
An example of the shared task annotations is provided below
Content Copyright
Portions © 1987-1989 Dow Jones & Company, Inc., © 2002 BBNT Solutions, LLC, © 1995, 1999, 2005, 2008, 2009 Trustees of the University of Pennsylvania |