Current Projects

LDC is involved in a number of projects that support language education, research and technology development.

DEFT (Deep Exploration and Filtering of Test) (DARPA)

The DARPA DEFT Program will develop automated systems to process text information and enable the understanding of connections in text that might not be readily apparent to humans. LDC supports the DEFT Program by collecting, creating and annotating a variety of data sources to support Smart Filtering, Relational Analysis and Anomaly Analysis.

Language Application Grid (NSF)

The Language Application Grid is an NSF-sponsored collaboration involving Vassar University, Brandeis University, Carnegie Mellon University and LDC. The stated goal is to develop a platform for natural language processing tools and resources that can be used and accessed by any researcher or developer.

LORELEI (Low Resource Languages for Emergent Incidents) (DARPA)

LORELEI seeks to identify the elements that different languages have in common and use that knowledge to enable rapid, low-cost development of automated language capabilities for use with low-resource languages for effective situational awareness. LDC supports LORELEI by collecting, creating and annotating linguistic resources in multiple languages.

LRE (Language Recognition Evaluation) (NIST)

LDC develops linguistic resources to support the NIST LRE series.

NIEUW (Novel Incentives and Workflows in Linguistic Data Collection and Annotation) (NSF)

NIEUW is an LDC project supported by an NSF CISE Research Infrastructure planning grant. The goal is to build a framework to develop multilingual language resources employing crowdsourcing techniques proven to work in multiple scientific disciplines. 

OpenMT (Machine Translation) (NIST)

LDC supports the NIST Open Machine Translation (OpenMT) Evaluation series by developing test sets in multiple languages and genres and by sharing linguistic resources developed in other programs including DARPA GALE and TIDES. The objective of the OpenMT evaluation series is to support research in machine translation (MT) technologies -- technologies that translate text between human languages -- and to advance the state of the art in the MT field. Input may include all forms of text. The goal is for the output to be an adequate and fluent translation of the original.

SRE (Speaker Recognition Evaluation) (NIST)

LDC develops linguistic resources to support the NIST Speaker Recognition Evaluation (SRE) series.

TAC (Text Analysis Conference) KBP (NIST)

The Text Analysis Conference (TAC) is a series of evaluation workshops organized by NIST to encourage research in Natural Language Processing and related applications. LDC provides linguistic resources including source data, annotations and system assessment for the KBP (Knowledge Base Population) Track, which promotes research in automated systems that can discover information about named entities as found in a large corpus and incorporate this information into a knowledge base.