What's New:

Membership Year 2018 (MY2018) is open for joining and discounts are available for those who keep their membership current and join early in the year. Now through March 1, 2018, current MY2017 members who renew before March 1 will receive a 10% discount off the membership fee. New or returning organizations will receive a 5% discount through March 1. 

In addition to receiving new publications, current year LDC members also enjoy the benefit of licensing older data at reduced costs from our Catalog of over 700 holdings; current year for-profit members may use most data for commercial applications.
Plans for MY2018 publications are in progress. Among the expected releases are: 
  • Multilanguage conversational telephone speech: developed to support language identification research in related languages (Central Asian, Central European language groups)
  • DIRHA (Distant-speech Interaction for Robust Home Applications):  Wall Street Journal read speech with noise and reverberation, suitable for various multi-microphone signal processing and distant speech recognition tasks
  • TRAD corpora: Chinese-French and Arabic-French parallel text (newswire, web data)
  • IARPA Babel Language Packs (telephone speech and transcripts):languages include Cebuano, Guarani, Kazakh, Lithuanian, Telugu, Tok Pisin
  • BOLT: discussion forum, SMS, word-aligned, and tagged data in all languages (Egyptian Arabic, English, Chinese)
  • DEFT: Spanish Treebank (newswire, web data)
  • RATS:  Language Identification data set (Dari, Farsi, Levantine Arabic, Pashto, Urdu; degraded audio signals)
  • TAC KBP: comprehensive English source and entity linked data (broadcast, telephone speech, newswire, web data)
  • German children’s handwriting: longitudinal study of weekly writing in classroom setting with enhanced output for specific spelling patterns

And don’t forget, MY2017 and MY2016 are still open for joining. MY2016 can be joined through December 31, 2017 and includes data such as BOLT Chinese Discussion Forums, IARPA Babel Language Packs in multiple languages and Multi-Language Conversational Telephone Speech – Slavic Group. MY 2017 will remain open through December 31, 2018; among the year’s releases are 2010 NIST Speaker Recognition Evaluation Test Set, RATS Keyword Spotting, Noisy TIMIT Speech and BOLT Egyptian Arabic SMS/Chat and Transliteration. For full descriptions of these data sets, browse our Catalog.  

Visit Join LDC for details on membership, user accounts and payment.

Web pages about data management plans (DMPs) describe the Consortium’s capabilities to develop and implement project specific proposals. To satisfy requirements from funders like the National Science Foundation (NSF) that researchers deposit data in an accessible, trustworthy repository, LDC provides archiving services and makes data publicly available at a reasonable cost while protecting intellectual property rights and privacy concerns.

Browse the pages to learn more about the advantages of data center distribution, the details of NSF DMP requirements and the infrastructures and processes LDC has in place for storing and distributing resources over the long-term. 


We've revamped our user services to make it easier than ever to access LDC data. Now you can become an LDC member, request corpora, sign agreements and submit payment online directly from your LDC user account.

You’ll receive email notifications of key points in the transaction, when for instance, an order is created, agreements are signed, payment is received and data is shipped. You can also track the status of a transaction from your user account. 

Visit the new Managing Your LDC Account page to learn more about user accounts and their privileges and the steps for online transactions.

As always, thanks to our members, sponsors, collaborators and licensees for your continued support.

Podcasts from the complete set of staff interviews conducted as part of LDC's 20th Anniversary can be accessed from the LDC Blog. Hear what long-time staffers had to say about their experiences at LDC.

Christopher Cieri, Executive Director -- Chris reflects on the road that brought him to LDC, some of his early responsibilities and Consortium activities. 

Mohamed Maamouri, Senior Researcher -- Mohamed recounts his personal and professional experiences and comments on Arabic resource development at LDC.

David Graff, Lead Programmer -- Dave was one of LDC's first staff members and offers some insights on LDC's early days.

Yiwola Awoyale, Moussa Bamba, Researchers -- Yiwola and Moussa discuss how they came to LDC, their work on West African langauges and how it benefits multiple communities.

Natalia Bragilveskaya, Business Manager; Ilya Ahtaridis, Membership Coordinator; Marian Reed, Marketing Coordinator -- Natalia, Ilya and Marian recall the early days of LDC and the development of its interactions with the University of Pennsylvania, sponsors, members, licensees and collaborators.