Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome


What's New! What's Free!


LDC at ICASSP 2013 ~ May 26-31
Early renewing members save on fees ~ membership savings
Commercial use and LDC data~ policies on use
Checking in with LDC Data Scholarship recipients ~ updates from our student recipients
LDC’s 20th Anniversary ~ concluding a year of celebration
Spring 2013 LDC Data Scholarship recipients ~ student recipients
RTE update for Penn Discourse Treebank ~ available for download
LDC 20th Anniversary Workshop Podcasts ~ listen to staff interviews; podcasts available through LDC blog
2012 User Survey Results ~ now available
Language Resource Wiki ~ meta-resource on language resources
LDC Providing Guidelines ~ enhanced guidelines for submitting corpora for publication by LDC
LDC Data Sheets ~ concise descriptions of LDC projects, operations, and technical capabilities
What's New Archive

New Corpora

GALE Arabic-English Parallel Aligned Treebank -- Newswire ~ ~267K tokens of word aligned Arabic and English parallel text with treebank annotations
MADCAT Phase 2 Training Set ~ handwritten Arabic documents annotated for physical coordinates and token
GALE Phase 2 Chinese Broadcast Conversation Speech ~ 120 hours of Chinese broadcast conversation speech
GALE Phase 2 Chinese Broadcast Conversation Transcripts ~ 1.5 million transcribed Chinese broadcast conversation data tokens
NIST 2008-2012 Open Machine Translation (OpenMT) Progress Test Sets ~ evaluation sets,DTD, scoring software, and evaluation plans for the Arabic-to-English and Chinese-to-English progress test sets
New Corpora Archive

Employment at the LDC

ACL Anthology ~ A Digital Archive of Research Papers in Computational Linguistics

OLAC ~ Open Language Archives Community

Linguistic Resources
Linguistic Data Consortium

The Linguistic Data Consortium supports language-related education, research and technology development by creating and sharing linguistic resources: data, tools and standards.

map

LDC is supported in part by grant IRI-9528587 from the Information and Intelligent Systems division and grant 9982201 from the Human Computer Interaction Program of the National Science Foundation. LDC's corpus creation efforts are powered in part by Academic Equipment Grant 7826-990 237-US from Sun Microsystems.

About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact ldc@ldc.upenn.edu
Last modified: Friday, 17-May-2013 15:34:29 EDT
© 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.