What's New! What's Free!
LDC at ICASSP 2013 ~ May 26-31
Early renewing members save on fees ~ membership savings
Commercial use and LDC data~ policies on use
Checking in with LDC Data Scholarship recipients ~ updates from our student recipients
LDC’s 20th Anniversary ~ concluding a year of celebration
Spring 2013 LDC Data Scholarship recipients ~ student recipients
RTE update for Penn Discourse Treebank ~ available for download
LDC 20th Anniversary Workshop Podcasts ~ listen to staff interviews; podcasts available through LDC blog
2012 User Survey Results ~ now available
Language Resource Wiki ~ meta-resource on language resources
LDC Providing Guidelines ~ enhanced guidelines for submitting corpora for publication by LDC
LDC Data Sheets ~ concise descriptions of LDC projects, operations, and technical capabilities
GALE Arabic-English Parallel Aligned Treebank -- Newswire ~ ~267K tokens of word aligned Arabic and English parallel text with treebank annotations
MADCAT Phase 2 Training Set ~ handwritten Arabic documents annotated for physical coordinates and token
GALE Phase 2 Chinese Broadcast Conversation Speech ~ 120 hours of Chinese broadcast conversation speech
GALE Phase 2 Chinese Broadcast Conversation Transcripts ~ 1.5 million transcribed Chinese broadcast conversation data tokens
NIST 2008-2012 Open Machine Translation (OpenMT) Progress Test Sets ~ evaluation sets,DTD, scoring software, and evaluation plans for the Arabic-to-English and Chinese-to-English progress test sets
New Corpora Archive
Employment at the LDC
ACL Anthology ~ A Digital Archive of Research
Papers in Computational Linguistics
OLAC ~ Open Language Archives Community
|| Linguistic Resources
The Linguistic Data Consortium supports language-related education, research
and technology development by creating and sharing linguistic resources:
data, tools and standards.
LDC is supported in part by grant IRI-9528587 from the Information and Intelligent
Systems division and grant 9982201 from the Human Computer Interaction Program of the
National Science Foundation.
LDC's corpus creation efforts are powered in part by Academic Equipment Grant 7826-990
237-US from Sun Microsystems.