User Agreements

Certain LDC data sets are governed by corpus-specific license agreements which supersede the LDC membership agreements and the LDC User Agreement for Non-Members and must therefore be signed by all licensees (members and nonmembers). Below is a list of corpora to which such licenses apply with links to the agreements.

Fax all completed user agreements to +1.215.573.2175 or scan and email them to the Membership Office.

 

2014

LDC2014S02 King Saud University Arabic Speech Database
Member/Nonmember

LDC2014S04 USC-SFI MALACH Interviews and Transcripts Czech
Member/Nonmember

2013

LDC2013T06 1993-2007 United Nations Parallel Text
Member/Nonmember

LDC2013S09 CSC Deceptive Speech
Member/Nonmember

2012

LDC2012T03 2009 CoNLL Shared Task Part 1
Member/Nonmember

LDC2012T11 American English Nickname Collection
Member/Nonmember

LDC2012S03 Digital Archive of Southern Speech
For-Profit Member

LDC2012S05 USC-SFI MALACH Interviews and Transcripts English
Member, Nonmember 

2011

LDC2011T04 Indian Language Part-of-Speech Tagset: Sanskrit
Member/Nonmember

2010

LDC2010T06 Chinese Web 5-gram Version 1
Member/Nonmember

LDC2010T16 Indian Language Part-of-Speech Tagset: Bengali
LDC2010T24 Indian Language Part-of-Speech Tagset: Hindi
Member/Nonmember

LDC2010L01 LDC Standard Arabic Morphological Analyzer (SAMA) Version 3.1
Member

2009

LDC2009V01 Audiovisual Database of Spoken American English
Member/Nonmember

LDC2009T04 BioProp Version 1.0
Member/Nonmember

LDC2009S01 CSLU: Numbers Version 1.3
LDC2009S03 CSLU: S4X Release 1.2
Member/Nonmember

LDC2009T08 Japanese Web N-gram Version 1
Member/Nonmember

LDC2009T25 Web 1T 5-gram, 10 European Languages Version 1
Member/Nonmember

2008

LDC2008T13 BLLIP North American News Text, Complete
Member

LDC2008T14 BLLIP North American News Text, General Release
Member/Nonmember

LDC2008S06 CSLU: Alphadigit Version 1.3
LDC2008S07 CSLU: ISOLET Spoken Letter Database Version 1.3
LDC2008S02 CSLU: National Cellular Telephone Speech Release 2.3
LDC2008S01 CSLU: Portland Cellular Telephone Speech Version 1.3
Member/Nonmember

LDC2008T22 Czech Academic Corpus 2.0
Member/Nonmember

LDC2008L02 Hindi WordNet
Member/Nonmember

LDC2008T01 Hungarian-English Parallel Text, Version 1.0
Member/Nonmember

LDC2008T15 North American News Text, Complete
Member

LDC2008T16 North American News Text, General Release
Nonmember

LDC2008T19 The New York Times Annotated Corpus
Member/Nonmember

2007

LDC2007T22 2001 Topic Annotated Enron Email Data Set
Member/Nonmember

LDC2007S08 CSLU: Foreign Accented English Release 1.2
LDC2007S18 CSLU: Kids` Speech Version 1.1
LDC2007S13 CSLU: Apple Words and Phrases
LDC2007S05 CSLU: Yes/No Version 1.2
Member/Nonmember

LDC2007S09 Mandarin Affective Speech
Member/Nonmember

LDC2007T19 MITRE 1997 Mandarin Broadcast News Speech Translations(Hub-4NE)
Member

LDC2007S15 Nationwide Speech Project
Member/Nonmember

2006

LDC2006S15 CSLU: Spelled and Spoken Words
LDC2006S14 CSLU: Stories v 1.2
LDC2006S35 CSLU: Multilanguage Telephone Speech Version 1.2
LDC2006S39 CSLU: Names Release 1.3
LDC2006S26 CSLU: Speaker Recognition Version 1.1
LDC2006S16 CSLU: Spoltech Brazilian Portuguese Version 1.0
LDC2006S01 CSLU: Voices
Member/Nonmember

LDC2006T03 Korean Propbank
Member/Nonmember

LDC2006T09 Korean Treebank Annotations Version 2.0
Member/Nonmember

LDC2006S13 N4 NATO Native and Non-Native Speech
Member/Nonmember

LDC2006T01 Prague Dependency Treebank 2.0
Member/Nonmember

LDC2006T13 Web 1T 5-gram Version 1
Member/Nonmember

2005

LDC2005T35 American National Corpus (ANC) Second Release
Member/Nonmember: Open Portion, Restricted Portion

2004

LDC2004T23 Prague Arabic Dependency Treebank 1.0
Member/Nonmember

LDC2004T25 Prague Czech-English Dependency Treebank 1.0
Member/Nonmember 

2002

LDC2002S11 1997 HUB4 English Evaluation Speech and Transcripts
Member/Nonmember

LDC2002T26 Korean English Treebank Annotations
Member/Nonmember

2001

LDC2001T62 CETEMpublico
Member/Nonmember

2000

LDC2000S86 1998 HUB4 Broadcast News Evaluation English Test Material
Member

LDC2000T43 BLLIP 1987-89 WSJ Corpus Release 1
Member/Nonmember

LDC2000T52 TREC Mandarin
Member/Nonmember

LDC2000T51 TREC Spanish
Member/Nonmember

1999

LDC99T34 Japanese Business News Text Supplement
Member

LDC99S82 USC Marketplace Broadcast News Speech
LDC99T36 USC Marketplace Broadcast News Transcripts
Member/Nonmember

1998

LDC98T31 1996 CSR HUB4 Language Model
Member

LDC98S73 1997 Mandarin Broadcast News Speech (HUB4-NE)
LDC98T24 1997 Mandarin Broadcast News Transcripts (HUB4-NE)
Member

LDC98L21 COMLEX English Syntax Lexicon
Member/Nonmember

LDC98T30 North American News Text Supplement
Member

LDC98T25 TDT Pilot Study Corpus
Member/Nonmember

1997

LDC97S66 1996 English Broadcast News Dev and Eval (HUB4)
LDC97S44 1996 English Broadcast News Speech (HUB4)
LDC97T22 1996 English Broadcast News Transcripts (HUB4)
Member

LDC97L20 CALLHOME American English Lexicon (PRONLEX)
LDC97L18 CALLHOME German Lexicon
Member, Nonmember

LDC97S63 The CMU Kids Corpus
Member/Nonmember

1996

LDC96L17 CALLHOME Japanese Lexicon
LDC96L15 CALLHOME Mandarin Chinese Lexicon
LDC96L16 CALLHOME Spanish Lexicon
Member, Nonmember

LDC96L14 CELEX2
Member/Nonmember

LDC96S33 CSR-IV HUB3
Member

LDC96S31 CSR-IV HUB4
Member/Nonmember

LDC96T10 Message Understanding Conference (MUC) 6 Additional News Text
Member/Nonmember

1995

LDC95T6 CSR-III Text
Member

LDC95T11 European Language Newspaper Text
Member

LDC95T8 Japanese Business News Text
Member

LDC95S28 LATINO-40 Spanish Read News
Member/Nonmember

LDC95T13 Mandarin Chinese News Text
Member/Nonmember

LDC95T21 North American News Text Corpus
Member

LDC95T9 Spanish News Text
Member

1994

LDC94T5 ECI Multilingual Text
Member/Nonmember

LDC94T4A UN Parallel Text (Complete)
LDC94T4B-1 UN Parallel Text (English)
LDC94T4B-2 UN Parallel Text (French)
LDC94T4B-3 UN Parallel Text (Spanish)
Member/Nonmember

1993

LDC93T1 ACL/DCI
Member/Nonmember

LDC93T3A TIPSTER Complete
LDC93T3B TIPSTER Volume 1
LDC93T3C TIPSTER Volume 2
LDC93T3D TIPSTER Volume 3
Member/Nonmember