Join LDC for Membership Year 2021

Membership Year 2021 (MY2021) is open and discounts are available for those who keep their membership current and join early. MY2020 members who renew their LDC membership before March 1, 2021 will receive a 10% discount off the membership fee, and new or returning organizations will enjoy a 5% discount.

In addition to accessing new publications, current LDC members also enjoy the benefit of licensing older data at reduced costs from our Catalog of over 850 holdings. Current-year for-profit members may use most data for commercial applications.

Plans for MY2021 publications are in progress. Among the expected releases are:

  • Global TIMIT Mandarin Chinese: 6,000 linguistically rich utterances featuring time-aligned lexical and phonetic transcription
  • Columbia Games Corpus: 12 spontaneous task-oriented dyadic conversations elicited from native Standard American English speakers playing computer games, transcribed and annotated for discourse/pragmatic phenomena
  • My Science Tutor Children’s Conversational Speech: 400+ hours of speech from 1,371 US third, fourth and fifth grade students participating in sessions with a virtual science tutor, transcripts included
  • The SSNCE Database of Tamil Dysarthric Speech: Tamil speech from 20 dysarthric speakers aged 12-40 years and a control group (10 speakers) with time-aligned phonetic transcripts
  • Icelandic Parliamentary Speech:6,493 Icelandic Parliament recordings from 2005-2016 with 196 speakers, aligned and segmented and divided into training, development and evaluation sets for ASR development
  • LORELEI: representative and incident language packs containing monolingual text, bi-text, translations, annotations, supplemental resources and related tools (Akan, Kinyarwanda, and Wolof)
  • BOLT: co-reference, treebank, propbank and translation resources for discussion forum, SMS/Chat and conversational telephone speech data in all languages (Chinese, Egyptian Arabic, and English)
  • TAC KBP: training and evaluation data for English surprise slot filling (2010) and English sentiment slot filling (2013-2014) tasks

It’s also not too late to join for MY2019 (through December 31, 2020) and MY2020 (through December 31, 2021). Data sets from those years include Penn Discourse Treebank Version 3.0, DEFT Committed Belief Annotation (Chinese, English, Spanish), 2018 NIST Speaker Recognition Evaluation Test Set, Mixer 4 and 5 Speech, AMR Annotation Release 3.0 and Penn Parsed Corpora of Historical English.

For full descriptions of all LDC data sets, browse our Catalog. Visit Join LDC for details on membership, user accounts and payment.