LDC supports shared tasks and workshops developed and sponsored by various groups principally through the distribution of relevant LDC data sets to participants. Examples of those programs include:
LDC supports this series of challenges that aim to study speech separation and recognition in typical everyday listening conditions by providing access to Wall Street Journal read speech data.
LDC has supported various shared tasks by providing data for COLING task participants. The results of these tasks were presented at a COLING annual conference.
This Association for Computational Linguistics (ACL) Special Interest Group on Natural Language Learning has, since 1999, developed a shared task in which training and test data is provided by the organizers and participating systems are evaluated and compared in a systematic way. Descriptions of the participating systems and an evaluation of their performances are presented at the yearly ACL conference and in the conference proceedings.
LDC has supported many of the CoNLL shared tasks by providing multilingual annotated texts to task organizers and participants. Data used in the 2008 and 2009 CoNLL shared tasks is available through the LDC Catalog.
Generation Challenges is a forum for shared task activities involving language generation. LDC provided data in support of the 2011 Surface Realization Shared Task.
The HCIR Symposium brings researchers and practitioners together to develop more sophisticated models, tools and evaluation metrics to support interactive information retrieval and exploratory search. LDC provided English annotated newswire data in support of the 2010 HCIR Challenge.
JHU’s Center for Language and Speech Processing organizes and hosts each summer a six week research workshop on speech and language engineering. LDC has supported several workshops by providing text and speech data for workshop participants. Data developed in the 2010 Summer Workshop on Speech Recognition and Conditional Random Fields using LDC broadcast material (Broadcast News Lattices) is available through the LDC Catalog.
The NTCIR Workshop is a series of evaluation workshops designed to enhance research in information access technologies including information retrieval, question answering, text summarization and extraction. LDC has provided multilingual text in support of several NTCIR tasks including Advanced Cross-Lingual Information Access, Multilingual Opinion Analysis Task and GeoTime.
The REVERB challenge focuses on speech enhancement and ASR tasks in reverberant environments. LDC supports this initiative by providing to participants multi-channel Wall Street Journal read speech.
SemEval is an ongoing series of evaluations of computational semantic analysis systems intended to explore the nature of meaning in language. LDC has supported several SemEval tasks. English data used in SemEval 2010 Task 1 is available through the LDC Catalog.
This ACL Special Interest Group on Chinese Language Processing provides a framework for researchers working on various aspects of Chinese language processing. LDC has provided data used in SIGHAN “bakeoffs”, i.e., competitions to assess research systems’ performance in various language processing tasks including word segmentation, named entity recognition and part-of-speech tagging.