December 2017 Newsletter

Friday, December 15, 2017


Spring 2018 LDC Data Scholarship Program - deadline approaching

Students can apply for the Spring 2018 Data Scholarship Program now through January 15, 2018. The LDC Data Scholarship program provides students with access to LDC data at no cost. For more information on application requirements and program rules, please visit LDC Data Scholarships

Lingo Boingo: a web portal to language games

LDC is pleased to announce a new collaborative project, Lingo Boingo ( a web portal that brings together new and existing language games that are fun to play and that provide useful annotations and judgments for linguistic research. Gamers and grammar lovers can choose from a list of challenging games, which will continue to expand through the efforts of LDC and external collaborators. For more information, contact  . Start playing today!

Renew your LDC membership today

Membership Year 2018 (MY2018) is open for joining and discounts are available for those who keep their membership current and join early in the year. Now through March 1, 2018, current MY2017 members who renew before March 1, will receive a 10% discount off of the membership fee. New or returning organizations will receive a 5% discount through March 1. 

In addition to receiving new publications, current year LDC members also enjoy the benefit of licensing older data at reduced costs from our Catalog of over 700 holdings; current year for-profit members may use most data for commercial applications. Visit Join LDC for details on membership, user accounts and payment.

Plans for MY2018 publications are in progress. Among the expected releases are:

  • Multilanguage conversational telephone speech: developed to support language identification research in related languages (Central Asian, Central European language groups)
  • DIRHA (Distant-speech Interaction for Robust Home Applications):  Wall Street Journal read speech with noise and reverberation, suitable for various multi-microphone signal processing and distant speech recognition tasks
  • TRAD corpora: Chinese-French and Arabic-French parallel text (newswire, web data)
  • IARPA Babel Language Packs (telephone speech and transcripts): languages include Cebuano, Guarani, Kazakh, Lithuanian, Telugu, Tok Pisin
  • BOLT: discussion forum, SMS, word-aligned, and tagged data in all languages (Egyptian Arabic, English, Chinese)
  • DEFT: Spanish Treebank (newswire, web data)
  • RATS:  Language Identification data set (Dari, Farsi, Levantine Arabic, Pashto, Urdu; degraded audio signals)
  • TAC KBP: comprehensive English source and entity linked data (broadcast, telephone speech, newswire, web data)
  • German children’s handwriting: longitudinal study of weekly writing in classroom setting with enhanced output for specific spelling patterns