What's New:

LDC’s website and Catalog will be unavailable from Saturday August 1 at 10:00pm EDT through Sunday August 2nd at 10:00am EDT. We apologize for any inconvenience.    

Collaborative work by LDC staff will be presented at ACL-IJCNLP 2015 and co-located workshops this week in Beijing, China.

The joint work of LDC’s Steven Bird and researchers Long Duong, Trevor Cohn (University of Melbourne) and Paul Cook (University of New Brunswick) on “Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser” is displayed during Tuesday’s P2.18 poster session in Plenary Hall (July 28 16:00-19:30).

Steven Bird et al.’s research results on “Cross-lingual Transfer for Unsupervised Dependency Parsing Without Parallel Data” is shared during the CoNLL 2015 workshop on Thursday (July 30 16:00-17:10).

LDC’s Jiahong Yuan presents joint research examining “Sentence selection for automatic scoring of Mandarin proficiency” during the eighth SIGHAN workshop on Thursday (July 30 11:50-12:10). This paper is coauthored by LDC’s Mark Liberman and Beijing Normal University’s Xiaoying Xu, Wei Lai, Weiping Ye and Xinru Zhao.

This is the 53rd meeting of the Association for Computational Linguistics (ACL) held jointly with the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. ACL is the premier conference for the field of computational linguistics. 

Fall 2015 Data Scholarship Program

Applications are now being accepted through Tuesday, September 15, 2015 for the Fall 2015 LDC Data Scholarship program. The LDC Data Scholarship program provides university students with access to LDC data at no-cost.

This program is open to students pursuing both undergraduate and graduate studies in an accredited college or university. LDC Data Scholarships are not restricted to any particular field of study; however, students must demonstrate a well-developed research agenda and a bona fide inability to pay. The selection process is highly competitive.

The application consists of two parts:

(1) Data Use Proposal. Applicants must submit a proposal describing their intended use of the data. The proposal should state which data the student plans to use and how the data will benefit their research project as well as information on the proposed methodology or algorithm.

Applicants should consult the LDC Catalog for a complete list of data distributed by LDC. Due to certain restrictions, a handful of LDC corpora are restricted to members of the Consortium. Applicants are advised to select a maximum of one to two databases.

(2) Letter of Support. Applicants must submit one letter of support from their thesis adviser or department chair. The letter must confirm that the department or university lacks the funding to pay the full non-member fee for the data and verify the student's need for data.

For further information on application materials and program rules, please visit the LDC Data Scholarship page.

There is still time for not-for-profit and government organizations to create a custom data collection of eight corpora from among LDC’s 2013 releases. Selection options include: 1993-2007 United Nations Parallel Text, Chinese Treebank 8.0, CSC Deceptive Speech, GALE Arabic and Chinese speech and text releases, Greybeard, MADCAT training data, NIST 2012 Open Machine Translation (OpenMT) evaluation and progress sets, and more. The 2013 Data Pack is available for a flat rate of $3500 through September 15, 2015.

To license the Data Pack and select eight corpora, login or register for an LDC user account and add the 2013 Data Pack and each of the eight data sets to your bin. Follow the check-out procedure, sign all applicable user agreements and select payment via wire transfer, purchase order or check. LDC will adjust the invoice total to reflect the data pack fee.

To pay via credit card, add the 2013 Data Pack to your bin and check out using the system prompts. At the completion of the transaction, send an email to ldc@ldc.upenn.edu indicating the eight data sets to include in your order. 


LDC’s new web pages about data management plans (DMPs) describe the Consortium’s capabilities to develop and implement project specific proposals. To satisfy requirements from funders like the National Science Foundation (NSF) that researchers deposit data in an accessible, trustworthy repository, LDC provides archiving services and makes data publicly available at a reasonable cost while protecting intellectual property rights and privacy concerns.

Browse the new pages to learn more about the advantages of data center distribution, the details of NSF DMP requirements and the infrastructures and processes LDC has in place for storing and distributing resources over the long-term. 


We've revamped our user services to make it easier than ever to access LDC data. Now you can become an LDC member, request corpora, sign agreements and submit payment online directly from your LDC user account.

You’ll receive email notifications of key points in the transaction, when for instance, an order is created, agreements are signed, payment is received and data is shipped. You can also track the status of a transaction from your user account. 

Visit the new Managing Your LDC Account page to learn more about user accounts and their privileges and the steps for online transactions.

As always, thanks to our members, sponsors, collaborators and licensees for your continued support.

Podcasts from the complete set of staff interviews conducted as part of LDC's 20th Anniversary can be accessed from the LDC Blog. Hear what long-time staffers had to say about their experiences at LDC.

Christopher Cieri, Executive Director -- Chris reflects on the road that brought him to LDC, some of his early responsibilities and Consortium activities. 

Mohamed Maamouri, Senior Researcher -- Mohamed recounts his personal and professional experiences and comments on Arabic resource development at LDC.

David Graff, Lead Programmer -- Dave was one of LDC's first staff members and offers some insights on LDC's early days.

Yiwola Awoyale, Moussa Bamba, Researchers -- Yiwola and Moussa discuss how they came to LDC, their work on West African langauges and how it benefits multiple communities.

Natalia Bragilveskaya, Business Manager; Ilya Ahtaridis, Membership Coordinator; Marian Reed, Marketing Coordinator -- Natalia, Ilya and Marian recall the early days of LDC and the development of its interactions with the University of Pennsylvania, sponsors, members, licensees and collaborators.