What's New:

Membership Year 2019 (MY2019) is open and discounts are available for those who keep their membership current and join early in the year. Now through March 1, 2019, current MY2018 members who renew their LDC membership before March 1 will receive a 10% discount off the membership fee. New or returning organizations will receive a 5% discount through March 1. 

In addition to receiving new publications, current LDC members also enjoy the benefit of licensing older data at reduced costs from our Catalog of over 750 holdings. Current-year for-profit members may use most data for commercial applications. 

Plans for MY2019 publications are in progress. Among the expected releases are:

  • SRI Speech-Based Collaborative Learning Corpus: speech from over 100 US middle school students performing collaborative learning tasks, includes audio recordings, orthographic transcriptions, manual annotation of collaboration, and related documentation
  • Chinese Abstract Meaning Representation (AMR): developed by Nanjing Normal University and Brandeis University, semantic representation of approximately 10,000 Chinese sentences following the basic principles of AMR using web source data from Chinese Treebank 8.0 (LDC2013T21)
  • Multilanguage conversational telephone speech: developed to support language identification research in related languages (Arabic, East Asian, English, Mandarin)
  • TAC KBP: English entity discovery and linking, nugget detection and event argument data, Chinese slot-filling data
  • CALLFRIEND Second Edition: updated releases with .wav format audio, simplified directory structure and enhanced documentation and metadata (English, Egyptian Arabic, Mandarin Chinese-Taiwan)
  • HAVIC Med Progress Test data: English web video, metadata, and annotations for developing multimedia systems
  • IARPA Babel Language Packs (telephone speech and transcripts): languages include Amharic, Guarani, Igbo, and Lithuanian
  • BOLT: discussion forums, SMS, word-aligned and tagged data in all languages (Chinese, Egyptian Arabic, English)

And, it’s not too late to join for MY2017 (through December 31, 2018) and MY2018 (through December 31, 2019). Data sets from those years include 2010 NIST Speaker Recognition Evaluation Test Set, RATS Keyword Spotting and Language Identification releases, CHiME, Noisy TIMIT Speech, Concretely Annotated New York Times and English Gigaword, DIRHA English WSJ Audio, LORELEI Amharic and Somali Language Packs and DEFT Spanish Treebank. For full descriptions of all LDC data sets, browse our Catalog.  

Visit Join LDC for details on membership, user accounts and payment.

Applications are now being accepted through January 15, 2019 for the Spring 2019 LDC Data Scholarship program which provides university students with no-cost access to LDC data. Consult the LDC Data Scholarship page for more information about program rules and submission requirements.

Web pages about data management plans (DMPs) describe the Consortium’s capabilities to develop and implement project specific proposals. To satisfy requirements from funders like the National Science Foundation (NSF) that researchers deposit data in an accessible, trustworthy repository, LDC provides archiving services and makes data publicly available at a reasonable cost while protecting intellectual property rights and privacy concerns.

Browse the pages to learn more about the advantages of data center distribution, the details of NSF DMP requirements and the infrastructures and processes LDC has in place for storing and distributing resources over the long-term. 


We've revamped our user services to make it easier than ever to access LDC data. Now you can become an LDC member, request corpora, sign agreements and submit payment online directly from your LDC user account.

You’ll receive email notifications of key points in the transaction, when for instance, an order is created, agreements are signed, payment is received and data is shipped. You can also track the status of a transaction from your user account. 

Visit the new Managing Your LDC Account page to learn more about user accounts and their privileges and the steps for online transactions.

As always, thanks to our members, sponsors, collaborators and licensees for your continued support.

Podcasts from the complete set of staff interviews conducted as part of LDC's 20th Anniversary can be accessed from the LDC Blog. Hear what long-time staffers had to say about their experiences at LDC.

Christopher Cieri, Executive Director -- Chris reflects on the road that brought him to LDC, some of his early responsibilities and Consortium activities. 

Mohamed Maamouri, Senior Researcher -- Mohamed recounts his personal and professional experiences and comments on Arabic resource development at LDC.

David Graff, Lead Programmer -- Dave was one of LDC's first staff members and offers some insights on LDC's early days.

Yiwola Awoyale, Moussa Bamba, Researchers -- Yiwola and Moussa discuss how they came to LDC, their work on West African langauges and how it benefits multiple communities.

Natalia Bragilveskaya, Business Manager; Ilya Ahtaridis, Membership Coordinator; Marian Reed, Marketing Coordinator -- Natalia, Ilya and Marian recall the early days of LDC and the development of its interactions with the University of Pennsylvania, sponsors, members, licensees and collaborators.