Other Evaluations

LDC supports shared tasks and workshops developed and sponsored by various groups principally through the distribution of relevant LDC data sets to participants. Examples of those programs include:

CHiME Speech Separation and Recognition Challenge

LDC supports this series of challenges that aim to study speech separation and recognition in typical everyday listening conditions by providing access to Wall Street Journal read speech data. CHiME2 Grid, CHiME2 WSJ0 and CHiME3 are available in the LDC Catalog.

COLING (International Conference on Computational Linguistics)

LDC has supported various shared tasks by providing data for COLING task participants. The results of these tasks were presented at a COLING annual conference.

CoNLL (The Conference on Natural Language Learning)

This Association for Computational Linguistics (ACL) Special Interest Group on Natural Language Learning has, since 1999, developed a shared task in which training and test data is provided by the organizers and participating systems are evaluated and compared in a systematic way. Descriptions of the participating systems and an evaluation of their performances are presented at the yearly ACL conference and in the conference proceedings.

LDC has supported many of the CoNLL shared tasks by providing multilingual annotated texts to task organizers and participants. Data used in the 2006, 2007, 2008, 2009, 2015 and 2016 CoNLL shared tasks is available through the LDC Catalog.


The DIHARD challenges were a set of shared tasks on speech diarization for challenging corpora. The development and evaluation sets developed by LDC were drawn from diverse sources including monologues, map task dialogues, broadcast interviews, sociolinguistic interviews, meeting speech, speech in restaurants, clinical recordings, extended child language acquisition recordings and web videos.

Data developed for DIHARD is available through the LDC Catalog.

Generation Challenges

Generation Challenges is a forum for shared task activities involving language generation. LDC provided data in support of the 2011 Surface Realization Shared Task.

HCIR (Human – Computer Interaction and Information Retrieval)

The HCIR Symposium brings researchers and practitioners together to develop more sophisticated models, tools and evaluation metrics to support interactive information retrieval and exploratory search. LDC provided English annotated newswire data in support of the 2010 HCIR Challenge. 

Johns Hopkins University (JHU) Summer Workshops

JHU’s Center for Language and Speech Processing organizes and hosts each summer a six week research workshop on speech and language engineering. LDC has supported several workshops by providing text and speech data for workshop participants. Data developed in the 2010 Summer Workshop on Speech Recognition and Conditional Random Fields using LDC broadcast material (Broadcast News Lattices) is available through the LDC Catalog.

NTCIR (NII Test Collection for IR Systems)

The NTCIR Workshop is a series of evaluation workshops designed to enhance research in information access technologies including information retrieval, question answering, text summarization and extraction. LDC has provided multilingual text in support of several NTCIR tasks including Advanced Cross-Lingual Information Access, Multilingual Opinion Analysis Task and GeoTime.

REVERB (REverberant Voice Enhancement and Recognition Benchmark) Challenge

The REVERB challenge focused on speech enhancement and ASR tasks in reverberant environments. LDC supported this initiative by providing to participants multi-channel Wall Street Journal read speech which is available through the LDC Catalog.

SemEval (Semantic Evaluation)

SemEval is an ongoing series of evaluations of computational semantic analysis systems intended to explore the nature of meaning in language. LDC has supported several SemEval tasks. English data used in SemEval 2010 Task 1 and in the SemEval 2014 and 2015 tasks on Semantic Dependency Parsing is available through the LDC Catalog.

Shared Task on Statistical Parsing of Morphologically-Rich Languages (SPMRL)

The primary goal of the Shared Task on Statistical Parsing of Morphologically-Rich Languages is to bring forward work on parsing morphologically ambiguous input in both dependency and constituency parsing. LDC supported the 2013 and 2014 shared tasks by providing Arabic treebank resources.


This ACL Special Interest Group on Chinese Language Processing provides a framework for researchers working on various aspects of Chinese language processing. LDC has provided data used in SIGHAN “bakeoffs”, i.e., competitions to assess research systems’ performance in various language processing tasks including word segmentation, named entity recognition and part-of-speech tagging.