| Linguistic Data Consortium

Penn CREF 2018 Workshop

The 2nd International Symposium on Language Resources and Intelligence

The 2nd International Symposium on Language Resources and Intelligence, co-organized by LDC and Beijing Language and Culture University, took place December 16-17, 2018 in Beijing, China. Topics included low resource languages, the Belt and Road Initiative, developing resources for artificial intelligence applications, Chinese dialects, speech processing and clinical applications of human language technology.

Read more about Penn CREF 2018 Workshop

IARPA Babel Telugu Language Pack IARPA-babel303b-v1.0a

Licenses for IARPA Babel Telugu Language Pack IARPA-babel303b-v1.0a

LDC Members must sign the appropriate agreement below:

Read more about IARPA Babel Telugu Language Pack IARPA-babel303b-v1.0a

LDC Data and HLT Evaluations

Common task human language technology (HLT) evaluations address a particular research problem. Participants develop and test possible models following an evaluation plan. Training, development and test data are ingested at various points in the process. Training data trains models, and development data (also known as a cross-validation set) is used to determine the best performing model(s). Test data assesses a model’s performance. Outcomes are analyzed and scored.

Training data, development data and test data can be subsets of a single data set. Development data and test data may originate from a source other than the training set.

For many HLT evaluation tasks, LDC partners with NIST’s Multimodal Information Group and Retrieval Group to provide training, development and test data for research areas that include speech recognition, language recognition, machine translation, cross-language retrieval and multimedia retrieval.

In collaboration with evaluation sponsors, the Consortium releases evaluation corpora through the Catalog. Many are turnkey packages that allow users to replicate the evaluation and consist of plans and specifications, training, development and test data, scoring software and answer keys.

Visit the Technology Evaluation pages for more information about evaluation data sets available from LDC.

New Publications

The 2nd International Symposium on Language Resources and Intelligence

New Publications

New Publications

Licenses for IARPA Babel Telugu Language Pack IARPA-babel303b-v1.0a

New Publications

New Publications

Announcements

New Publications

New Publications