November 2017 Newsletter

Friday, November 17, 2017


Join LDC for Membership Year 2018

Membership Year 2018 (MY2018) is open for joining and discounts are available for those who keep their membership current and join early in the year. Now through March 1, 2018, current MY2017 members who renew before March 1 will receive a 10% discount off the membership fee. New or returning organizations will receive a 5% discount through March 1. 

In addition to receiving new publications, current year LDC members also enjoy the benefit of licensing older data at reduced costs from our Catalog of over 700 holdings; current year for-profit members may use most data for commercial applications.
Plans for MY2018 publications are in progress. Among the expected releases are: 
  • Multilanguage conversational telephone speech: developed to support language identification research in related languages (Central Asian, Central European language groups)
  • DIRHA (Distant-speech Interaction for Robust Home Applications):  Wall Street Journal read speech with noise and reverberation, suitable for various multi-microphone signal processing and distant speech recognition tasks
  • TRAD corpora: Chinese-French and Arabic-French parallel text (newswire, web data)
  • IARPA Babel Language Packs (telephone speech and transcripts):languages include Cebuano, Guarani, Kazakh, Lithuanian, Telugu, Tok Pisin
  • BOLT: discussion forum, SMS, word-aligned, and tagged data in all languages (Egyptian Arabic, English, Chinese)
  • DEFT: Spanish Treebank (newswire, web data)
  • RATS:  Language Identification data set (Dari, Farsi, Levantine Arabic, Pashto, Urdu; degraded audio signals)
  • TAC KBP: comprehensive English source and entity linked data (broadcast, telephone speech, newswire, web data)
  • German children’s handwriting: longitudinal study of weekly writing in classroom setting with enhanced output for specific spelling patterns

And don’t forget, MY2017 and MY2016 are still open for joining. MY2016 can be joined through December 31, 2017 and includes data such as BOLT Chinese Discussion Forums, IARPA Babel Language Packs in multiple languages and Multi-Language Conversational Telephone Speech – Slavic Group. MY 2017 will remain open through December 31, 2018; among the year’s releases are 2010 NIST Speaker Recognition Evaluation Test Set, RATS Keyword Spotting, Noisy TIMIT Speech and BOLT Egyptian Arabic SMS/Chat and Transliteration. For full descriptions of these data sets, browse our Catalog.  

Visit Join LDC for details on membership, user accounts and payment.
Spring 2018 Data Scholarship Program
Applications are now being accepted through January 15, 2018 for the Spring 2018 LDC Data Scholarship program which provides university students with no-cost access to LDC data. Consult the LDC Data Scholarship page for more information about program rules and submission requirements. 
Commercial use and LDC data
For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. Visit the Licensing page for further information.