What's New:

Chilin HK Limited (Chilin) and LDC are pleased to announce that the parallel data resource developed by Chilin, Chinese-English Parallel Sentences Extracted from Patents, is now available through the LDC Catalog. This is a special release in addition to LDC's scheduled corpora for membership year 2016, available under separate terms.

The Chilin corpus has primarily resulted from training corpus and test sets developed specifically for the Tokyo-based NTCIR 2009 & 2010 competitions on Patent MT (machine translation), which drew more than 30 international teams:

NTCIR-9: http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings9/NTCIR/01-NTCIR9-PATENTMT-GotoI.pdf

NTCIR-10: http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings10/pdf/NTCIR/OVERVIEW/01-NTCIR10-PATENTMT-GotoI.pdf

The training corpus is drawn from a much larger curated corpus of parallel Chinese-English sentences and sentence fragments which have been winnowed from an even larger corpus of more than 300k parallel Chinese-English patents in different fields, initially at the Research Centre on Language Information Sciences, City University of Hong Kong (authors:  Benjamin Tsou, Bin Lu, and Kapo Chow). This data set is available from LDC under the following reference:

LDC2016T22 Chinese-English Parallel Sentences Extracted from Patents

Not-for-profit organizations may license this data set for US$25.00 under the LDC Not-for-Profit Membership Agreement or under the LDC User Agreement for Non-Members for use in linguistic research, education and non-commercial technology development. For-profit organizations may license this data for US$5000, discounted to US$4000 for LDC for-profit members, under a commercial license.

Congratulations to the recipients of LDC's Fall 2016 data scholarships:

Tiba Zaki Abdulhameed: Western Michigan University (USA); PhD Candidate, Computer Science. Tiba is awarded copies of GALE Phase 2 Arabic Broadcast Conversation Speech and Transcripts for her research in dialectal ASR.

Abhishek Abhishek: Indian Institute of Technology Guwahati (India); PhD Candidate, Computer Science and Engineering. Abhishek is awarded a copy of The New York Times Annotated Corpus for his research in coreference resolution and relation extraction.

Sara Ebrahim: Ain Shams University (Egypt); Msc, Computer Science. Sara is awarded copies of LDC Standard Arabic Morphological Analyzer and NIST OpenMT 2008 Evaluation Selected References and System Translations for her work in machine translation.

Katherine Metcalf: Indiana University (USA), PhD Candidate, Computer Science. Katherine is awarded a copy of Emotional Prosody Speech and Transcripts for her research in acoustic/prosodic approaches to classifying emotional states.

Mousmita Sarma: Gauhati University (India), Post-Masters Research, Electronics and Communications Technology. Mousmita is awarded copies of Switchboard 1-Release 2 and IARPA Babel Assamese Language Pack for her research in Assamese dialect identification.

For information about the program, visit the Data Scholarship page.

Web pages about data management plans (DMPs) describe the Consortium’s capabilities to develop and implement project specific proposals. To satisfy requirements from funders like the National Science Foundation (NSF) that researchers deposit data in an accessible, trustworthy repository, LDC provides archiving services and makes data publicly available at a reasonable cost while protecting intellectual property rights and privacy concerns.

Browse the pages to learn more about the advantages of data center distribution, the details of NSF DMP requirements and the infrastructures and processes LDC has in place for storing and distributing resources over the long-term. 


We've revamped our user services to make it easier than ever to access LDC data. Now you can become an LDC member, request corpora, sign agreements and submit payment online directly from your LDC user account.

You’ll receive email notifications of key points in the transaction, when for instance, an order is created, agreements are signed, payment is received and data is shipped. You can also track the status of a transaction from your user account. 

Visit the new Managing Your LDC Account page to learn more about user accounts and their privileges and the steps for online transactions.

As always, thanks to our members, sponsors, collaborators and licensees for your continued support.

Podcasts from the complete set of staff interviews conducted as part of LDC's 20th Anniversary can be accessed from the LDC Blog. Hear what long-time staffers had to say about their experiences at LDC.

Christopher Cieri, Executive Director -- Chris reflects on the road that brought him to LDC, some of his early responsibilities and Consortium activities. 

Mohamed Maamouri, Senior Researcher -- Mohamed recounts his personal and professional experiences and comments on Arabic resource development at LDC.

David Graff, Lead Programmer -- Dave was one of LDC's first staff members and offers some insights on LDC's early days.

Yiwola Awoyale, Moussa Bamba, Researchers -- Yiwola and Moussa discuss how they came to LDC, their work on West African langauges and how it benefits multiple communities.

Natalia Bragilveskaya, Business Manager; Ilya Ahtaridis, Membership Coordinator; Marian Reed, Marketing Coordinator -- Natalia, Ilya and Marian recall the early days of LDC and the development of its interactions with the University of Pennsylvania, sponsors, members, licensees and collaborators.