November 2020 Newsletter

Monday, November 16, 2020

New Publications

Global TIMIT Learner Simple English

LORELEI Ukrainian Representative Language Pack

TAC KBP Event Argument – Comprehensive Training and Evaluation Data 2016-2017


Join LDC for Membership Year 2021 

Membership Year 2021 (MY2021) is open and discounts are available for those who keep their membership current and join early. Current MY2020 members who renew their LDC membership before March 1, 2021 will receive a 10% discount off the membership fee. New or returning organizations will receive a 5% discount when joining by March 1.

In addition to receiving new publications, current LDC members also enjoy the benefit of licensing older data at reduced costs from our Catalog of over 850 holdings. Current-year for-profit members may use most data for commercial applications.

Plans for MY2021 publications are in progress. Among the expected releases are:

  • Global TIMIT Mandarin Chinese: 6,000 linguistically rich utterances featuring time-aligned lexical and phonetic transcription
  • Columbia Games Corpus: 12 spontaneous task-oriented dyadic conversations elicited from native Standard American English speakers playing computer games, transcribed and annotated for discourse/pragmatic phenomena
  • My Science Tutor Children’s Conversational Speech: 400+ hours of speech from 1,371 US third, fourth, and fifth grade students participating in sessions with a virtual science tutor, transcripts included
  • The SSNCE Database of Tamil Dysarthric Speech: Tamil speech from 20 dysarthric speakers aged 12-40 years and a control group (10 speakers) with time-aligned phonetic transcripts
  • Icelandic Parliamentary Speech: 6,493 Icelandic Parliament recordings from 2005-2016 with 196 speakers, aligned and segmented and divided into training, development, and evaluation sets for ASR development
  • LORELEI: representative and incident language packs containing monolingual text, bi-text, translations, annotations, supplemental resources, and related tools (Akan, Kinyarwanda, and Wolof)
  • BOLT: co-reference, treebank, propbank, and translation resources for discussion forum, SMS/Chat, and conversational telephone speech data in all languages (Chinese, Egyptian Arabic, and English)
  • TAC KBP: training and evaluation data for English surprise slot filling (2010) and English sentiment slot filling (2013-2014) tasks 

It’s also not too late to join for MY2019 (through December 31, 2020) and MY2020 (through December 31, 2021). Data sets from those years include Penn Discourse Treebank Version 3.0, DEFT Committed Belief Annotation (Chinese, English, Spanish), 2018 NIST Speaker Recognition Evaluation Test Set, Mixer 4 and 5 Speech, AMR Annotation Release 3.0, and Penn Parsed Corpora of Historical English.

For full descriptions of all LDC data sets, browse our Catalog.  

Visit Join LDC for details on membership, user accounts and payment.

Spring 2021 Data Scholarship Application Deadline

Applications are now being accepted through January 15, 2021 for the Spring 2021 LDC Data Scholarship program which provides university students with no-cost access to LDC data. Consult the LDC Data Scholarship page for more information about program rules and submission requirements.