Other Evaluations

LDC supports shared tasks and workshops developed and sponsored by various groups principally through the distribution of relevant LDC data sets to participants. Examples of those programs include:

COLING (International Conference on Computational Linguistics)

LDC has supported various shared tasks by providing data for COLING task participants. The results of these tasks were presented at a COLING annual conference.

CoNLL (The Conference on Natural Language Learning)

This Association for Computational Linguistics (ACL) Special Interest Group on Natural Language Learning has, since 1999, developed a shared task in which training and test data is provided by the organizers and participating systems are evaluated and compared in a systematic way. Descriptions of the participating systems and an evaluation of their performances are presented at the yearly ACL conference and in the conference proceedings.

LDC has supported many of the CoNLL shared tasks by providing multilingual annotated texts to task organizers and participants. English data used in the 2008 CoNLL shared task is available through the LDC Catalog.

Generation Challenges

Generation Challenges is a forum for shared task activities involving language generation. LDC provided data in support of the 2011 Surface Realization Shared Task.

HCIR (Human – Computer Interaction and Information Retrieval)

The HCIR Symposium brings researchers and practitioners together to develop more sophisticated models, tools and evaluation metrics to support interactive information retrieval and exploratory search. LDC provided English annotated newswire data in support of the 2010 HCIR Challenge. 

Johns Hopkins University (JHU) Summer Workshops

JHU’s Center for Language and Speech Processing organizes and hosts each summer a six week research workshop on speech and language engineering. LDC has supported several workshops by providing text and speech data for workshop participants.Data developed in the 2010 Summer Workshop on Speech Recognition and Conditional Random Fields using LDC broadcast material (Broadcast News Lattices) is available through the LDC Catalog.

NTCIR (NII Test Collection for IR Systems)

The NTCIR Workshop is a series of evaluation workshops designed to enhance research in information access technologies including information retrieval, question answering, text summarization and extraction. LDC has provided multilingual text in support of several NTCIR tasks including Advanced Cross-Lingual Information Access, Multilingual Opinion Analysis Task and GeoTime.

SemEval (Semantic Evaluation)

SemEval is an ongoing series of evaluations of computational semantic analysis systems intended to explore the nature of meaning in language. LDC has supported several SemEval tasks. English data used in SemEval 2010 Task 1 is available through the LDC Catalog.

SIGHAN

This ACL Special Interest Group on Chinese Language Processing provides a framework for researchers working on various aspects of Chinese language processing. LDC has provided data used in SIGHAN “bakeoffs”, i.e., competitions to assess research systems’ performance in various language processing tasks including word segmentation, named entity recognition and part-of-speech tagging.