Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome


What's New! What's Free!



New LDC website ~ coming soon
LDC Spoken Language Sampler - 2nd Release ~ available for download
Mixer 6 Speech ~ now available
High School students ~ use LDC data
Commercial use and LDC data~ policies on use
Checking in with LDC Data Scholarship recipients ~ updates from our student recipients
LDC 20th Anniversary Workshop Podcasts ~ listen to staff interviews; podcasts available through LDC blog
2012 User Survey Results ~ now available
Language Resource Wiki ~ meta-resource on language resources
LDC Providing Guidelines ~ guidelines for submitting corpora for publication by LDC
LDC Data Sheets ~ concise descriptions of LDC projects, operations, and technical capabilities
What's New Archive

New Corpora

GALE Phase 2 Arabic Broadcast Conversation Speech Part 2 ~ 141 audio files containing Arabic broadcast conversation
GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 2 ~ 763K tokens of transcribed Arabic broadcast conversations
Semantic Textual Similarity (STS) 2013 Machine Translation ~ 750 English sentence pairs translated from Arabic and Chinese; available at no-cost
GALE Phase 2 Chinese Broadcast Conversation Parallel Text Part 2 ~ 20 source-translation document pairs of Chinese source text and its English translation
MADCAT Phase 3 Training Set ~ handwritten Arabic documents annotated for physical coordinates and token
Mixer 6 Speech ~ 15K hours of telephone speech, interviews and transcript readings from 594 native English speakers.
New Corpora Archive

Employment at the LDC

ACL Anthology ~ A Digital Archive of Research Papers in Computational Linguistics

OLAC ~ Open Language Archives Community

Linguistic Resources
Linguistic Data Consortium

The Linguistic Data Consortium supports language-related education, research and technology development by creating and sharing linguistic resources: data, tools and standards.

map

LDC is supported in part by grant IRI-9528587 from the Information and Intelligent Systems division and grant 9982201 from the Human Computer Interaction Program of the National Science Foundation. LDC's corpus creation efforts are powered in part by Academic Equipment Grant 7826-990 237-US from Sun Microsystems.

About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact ldc@ldc.upenn.edu
Last modified: Wednesday, 18-Sep-2013 12:40:55 EDT
© 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.