User Agreements
Certain LDC data sets are governed by corpus-specific license agreements which supersede the LDC membership agreements and the LDC User Agreement for Non-Members and must therefore be signed by all licensees (members and nonmembers). Below is a list of corpora to which such licenses apply with links to the agreements.
Fax all completed user agreements to +1.215.573.2175 or scan and email them to the Membership Office.
- 2024
- 2023
- 2022
- 2021
- 2020
- 2019
- 2018
- 2017
- 2016
- 2015
- 2014
- 2013
- 2012
- 2011
- 2010
- 2009
- 2008
- 2007
- 2006
- 2005
- 2004
- 2002
- 2001
- 2000
- 1999
- 1998
- 1997
- 1996
- 1995
- 1994
- 1993
2024
LDC2024S04 BabyEars Affective Vocalizations
Member/Nonmember
LDC2024S11 L2-KSU Native and Non-Native Arabic Speech
Member/Nonmember
LDC2024S07 MATERIAL Bulgarian-English Language Pack
For-Profit Member, Not-For-Profit Member, Nonmember
LDC2024S10 MATERIAL Somali-English Language Pack
For-Profit Member, Not-For-Profit Member, Nonmember
LDC2024S09 Ravnursson Faroese Speech and Transcripts
For-Profit Member, Not-For-Profit Member, Nonmember
LDC2024S02 Second Language University Speech Intelligibility Corpus
Member/Nonmember
2023
LDC2023L01 Moroccan Arabic - English Lexical Database
Member/Nonmember
LDC2023S05 Samrómur Queries Icelandic Speech 1.0
For-Profit Member, Not-For-Profit Member, Nonmember
2022
LDC2022S08 MASRI Synthetic
Member/Nonmember
LDC2022S11 Samrómur Children Icelandic Speech 1.0
For-Profit Member, Not-For-Profit Member, Nonmember
LDC2022S05 Samrómur Icelandic Speech 1.0
For-Profit Member, Not-For-Profit Member, Nonmember
LDC2022S07 Second DIHARD Challenge Evaluation - SEEDLingS
For-Profit Member, Not-For-Profit Member, Nonmember
LDC2022S03 Spoken Digits in Hindi and Indian English
Member/Nonmember
2021
LDC2021S01 Althingi Parliamentary Speech
For-Profit Member, Not-For-Profit Member, Nonmember
LDC2021S02 Columbia Games Corpus
Member/Nonmember
LDC2021S06 Ethnobotanical Research and Language Documentation of Nahuatl
Member/Nonmember
LDC2021S05 MyST Children's Conversational Speech
Member/Nonmember
LDC2021S11 Second DIHARD Challenge Development - SEEDLingS
For-Profit Member, Not-For-Profit Member, Nonmember
LDC2021S04 The SSNCE Database of Tamil Dysarthric Speech
Member/Nonmember
2020
LDC2020T23 Corpus of Law, Academic, and News
Member/Nonmember
LDC2020L01 Database of Word Level Statistics – Mandarin
Member/Nonmember
LDC2020T06 EVALution
Member/Nonmember
LDC2020S02 IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b
Member, Nonmember
LDC2020S07 IARPA Babel Javanese Language Pack IARPA-babel402b-v1.0b
Member, Nonmember
LDC2020S10 IARPA Babel Mongolian Language Pack IARPA-babel401b-v2.0b
Member, Nonmember
LDC2020T16 Penn Parsed Corpora of Historical English
Member/Nonmember
LDC2020T12 SemTransCNC
Member/Nonmember
2019
LDC2019S22 IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b
Member, Nonmember
LDC2019S16 IARPA Babel Igbo Language Pack IARPA-babel306b-v2.0c
Member, Nonmember
LDC2019S13 First DIHARD Challenge Evaluation - SEEDLingS
LDC2019S10 First DIHARD Challenge Development – SEEDLingS
Member, Nonmember
LDC2019S11 USC-SFI MALACH Interviews and Transcripts English – Speech Recognition Edition
Member, Nonmember
LDC2019S08 IARPA Babel Guarani Language Pack IARPA-babel305b-v1.0c
Member, Nonmember
LDC2019S03 IARPA Babel Lithuanian Language Pack IARPA-babel304b-v1.0b
Member, Nonmember
LDC2019S01 SRI Speech-Based Collaborative Learning Corpus
Member/Nonmember
2018
LDC2018S01 DIRHA English WSJ Audio
Member/Nonmember
LDC2018T05 H2, E2, ERK1 Children's Writing
Member/Nonmember
LDC2018S07 IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b
Member, Nonmember
LDC2018S13 IARPA Babel Kazakh Language Pack IARPA-babel302b-v1.0a
Member, Nonmember
LDC2018S16 IARPA Babel Telugu Language Pack IARPA-babel303b-v1.0a
Member, Nonmember
LDC2018S02 IARPA Babel Tok Pisin Language Pack IARPA-babel207b-v1.0e
Member, Nonmember
LDC2018S17 Nautilus Speaker Characterization
Member/Nonmember
LDC2018T13 TRAD Arabic-French Parallel Text -- Newsgroup
Member, Nonmember
LDC2018T21 TRAD Arabic-French Parallel Text -- Newswire
Member, Nonmember
LDC2018T02 TRAD Chinese-French Parallel Text -- Blog
Member, Nonmember
LDC2018T17 TRAD Chinese-French Parallel Text -- Broadcast News
Member, Nonmember
2017
LDC2017S21 ASpIRE Development and Development Test Sets
Member, Nonmember
LDC2017T03 First-Year Law Students' Court Memoranda
Member/Nonmember
LDC2017S03 IARPA Babel Haitian Creole Language Pack IARPA-babel201b-v0.2b
Member, Nonmember
LDC2017S22 IARPA Babel Kurmanji Kurdish Language Pack IARPA-babel205b-v1.0a
Member, Nonmember
LDC2017S08 IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a
Member, Nonmember
LDC2017S05 IARPA Babel Swahili Language Pack IARPA-babel202b-v1.0d
Member, Nonmember
LDC2017S13 IARPA Babel Tamil Language Pack IARPA-babel204b-v1.1b
Member, Nonmember
LDC2017S01 IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7
Member, Nonmember
LDC2017S19 IARPA Babel Zulu Language Pack IARPA-babel206b-v0.1e
Member, Nonmember
LDC2017S17 Vehicle City Voices - Part 1
Member/Nonmember
2016
LDC2016T22 A Corpus of Chinese-English Parallel Sentences Extracted from Patents
For-Profit Members/For-Profit Nonmembers
LDC2016S04 CHM150
Member/Nonmember
LDC2016S05 Digital Archive of Southern Speech - NLP Version
For-Profit Member
LDC2016T01 H1 Children's Writing
Member/Nonmember
LDC2016S06 IARPA Babel Assamese Language Pack IARPA-babel102b-v0.5a
Member, Nonmember
LDC2016S02 IARPA Babel Cantonese Language Pack IARPA-babel101b-v0.4c
Member, Nonmember
LDC2016S08 IARPA Babel Bengali Language Pack IARPA-babel103b-v0.4b
Member, Nonmember
LDC2016S12 IARPA Babel Georgian Language Pack IARPA-babel105b-v0.5
Member, Nonmember
LDC2016S09 IARPA Babel Pashto Language Pack IARPA-babel104b-v0.4bY
Member, Nonmember
LDC2016S13 IARPA Babel Tagalog Language Pack IARPA-babel106-v0.2g
Member, Nonmember
LDC2016S10 IARPA Babel Turkish Language Pack IARPA-babel105b-v0.5
Member, Nonmember
LDC2016T24 JANA: A Human-Human Dialogues Corpus for Egyptian Dialect
For-Profit Members/For-Profit Nonmembers
2015
LDC2015S10 Arabic Learner Corpus
Member/Nonmember
LDC2015T03 Avocado Research Email Collection
Member/Nonmember
LDC2015S05 Mandarin Chinese Phonetic Segmentation and Tone
Member
2014
LDC2014T24 Boulder Lies and Truth
Member/Nonmember
LDC2014T06 ETS Corpus of Non-Native Written English
Member/Nonmember
LDC2014S02 King Saud University Arabic Speech Database
Member/Nonmember
LDC2014S03 Multi-Channel WSJ Audio
Member/Nonmember
LDC2014S08 United Nations Proceedings Speech
Member/Nonmember
LDC2014S04 USC-SFI MALACH Interviews and Transcripts Czech
Member, Nonmember
2013
LDC2013T06 1993-2007 United Nations Parallel Text
Member/Nonmember
LDC2013S09 CSC Deceptive Speech
Member/Nonmember
2012
LDC2012T03 2009 CoNLL Shared Task Part 1
Member/Nonmember
LDC2012T11 American English Nickname Collection
Member/Nonmember
LDC2012S03 Digital Archive of Southern Speech
For-Profit Member
LDC2012S05 USC-SFI MALACH Interviews and Transcripts English
Member, Nonmember
2011
LDC2011T04 Indian Language Part-of-Speech Tagset: Sanskrit
Member/Nonmember
2010
LDC2010T06 Chinese Web 5-gram Version 1
Member/Nonmember
LDC2010T16 Indian Language Part-of-Speech Tagset: Bengali
LDC2010T24 Indian Language Part-of-Speech Tagset: Hindi
Member/Nonmember
LDC2010L01 LDC Standard Arabic Morphological Analyzer (SAMA) Version 3.1
Member
2009
LDC2009V01 Audiovisual Database of Spoken American English
Member/Nonmember
LDC2009T04 BioProp Version 1.0
Member/Nonmember
LDC2009S01 CSLU: Numbers Version 1.3
LDC2009S03 CSLU: S4X Release 1.2
Member/Nonmember
LDC2009T08 Japanese Web N-gram Version 1
Member/Nonmember
LDC2009T25 Web 1T 5-gram, 10 European Languages Version 1
Member/Nonmember
2008
LDC2008T13 BLLIP North American News Text, Complete
Member
LDC2008T14 BLLIP North American News Text, General Release
Member/Nonmember
LDC2008S06 CSLU: Alphadigit Version 1.3
LDC2008S07 CSLU: ISOLET Spoken Letter Database Version 1.3
LDC2008S02 CSLU: National Cellular Telephone Speech Release 2.3
LDC2008S01 CSLU: Portland Cellular Telephone Speech Version 1.3
Member/Nonmember
LDC2008T22 Czech Academic Corpus 2.0
Member/Nonmember
LDC2008L02 Hindi WordNet
Member/Nonmember
LDC2008T01 Hungarian-English Parallel Text, Version 1.0
Member/Nonmember
LDC2008T15 North American News Text, Complete
Member
LDC2008T16 North American News Text, General Release
Nonmember
2007
LDC2007T22 2001 Topic Annotated Enron Email Data Set
Member/Nonmember
LDC2007S08 CSLU: Foreign Accented English Release 1.2
LDC2007S18 CSLU: Kids` Speech Version 1.1
LDC2007S13 CSLU: Apple Words and Phrases
LDC2007S05 CSLU: Yes/No Version 1.2
Member/Nonmember
LDC2007S09 Mandarin Affective Speech
Member/Nonmember
LDC2007T19 MITRE 1997 Mandarin Broadcast News Speech Translations(Hub-4NE)
Member
LDC2007S15 Nationwide Speech Project
Member/Nonmember
2006
LDC2006S15 CSLU: Spelled and Spoken Words
LDC2006S14 CSLU: Stories v 1.2
LDC2006S35 CSLU: Multilanguage Telephone Speech Version 1.2
LDC2006S39 CSLU: Names Release 1.3
LDC2006S26 CSLU: Speaker Recognition Version 1.1
LDC2006S16 CSLU: Spoltech Brazilian Portuguese Version 1.0
LDC2006S01 CSLU: Voices
Member/Nonmember
LDC2006T03 Korean Propbank
Member/Nonmember
LDC2006T09 Korean Treebank Annotations Version 2.0
Member/Nonmember
LDC2006S13 N4 NATO Native and Non-Native Speech
Member/Nonmember
LDC2006T01 Prague Dependency Treebank 2.0
Member/Nonmember
LDC2006S30 Speech Controlled Computing
Member/Nonmember
LDC2006T13 Web 1T 5-gram Version 1
Member/Nonmember
2005
LDC2005T35 American National Corpus (ANC) Second Release
Member/Nonmember: Open Portion, Restricted Portion
2004
LDC2004L02 Buckwalter Arabic Morphological Analyzer Version 2.0
Member
LDC2004T23 Prague Arabic Dependency Treebank 1.0
Member/Nonmember
LDC2004T25 Prague Czech-English Dependency Treebank 1.0
Member/Nonmember
2002
LDC2002S11 1997 HUB4 English Evaluation Speech and Transcripts
Member/Nonmember
LDC2002T26 Korean English Treebank Annotations
Member/Nonmember
2001
LDC2001T62 CETEMpublico
Member/Nonmember
2000
LDC2000S86 1998 HUB4 Broadcast News Evaluation English Test Material
Member
LDC2000T43 BLLIP 1987-89 WSJ Corpus Release 1
Member/Nonmember
LDC2000T52 TREC Mandarin
Member/Nonmember
LDC2000T51 TREC Spanish
Member/Nonmember
1999
LDC99L22 Egyptian Colloquial Arabic Lexicon
For-Profit Member, Not-For-Profit Member, Nonmember
LDC99T34 Japanese Business News Text Supplement
Member
LDC99S82 USC Marketplace Broadcast News Speech
LDC99T36 USC Marketplace Broadcast News Transcripts
Member/Nonmember
1998
LDC98T31 1996 CSR HUB4 Language Model
Member
LDC98S73 1997 Mandarin Broadcast News Speech (HUB4-NE)
LDC98T24 1997 Mandarin Broadcast News Transcripts (HUB4-NE)
Member
LDC98L21 COMLEX English Syntax Lexicon
Member/Nonmember
LDC98T30 North American News Text Supplement
Member
LDC98T25 TDT Pilot Study Corpus
Member/Nonmember
1997
LDC97S66 1996 English Broadcast News Dev and Eval (HUB4)
LDC97S44 1996 English Broadcast News Speech (HUB4)
LDC97T22 1996 English Broadcast News Transcripts (HUB4)
Member
LDC97L20 CALLHOME American English Lexicon (PRONLEX)
LDC97L18 CALLHOME German Lexicon
Member, Nonmember
LDC97S63 The CMU Kids Corpus
Member/Nonmember
1996
LDC96L17 CALLHOME Japanese Lexicon
LDC96L15 CALLHOME Mandarin Chinese Lexicon
LDC96L16 CALLHOME Spanish Lexicon
Member, Nonmember
LDC96L14 CELEX2
Member/Nonmember
LDC96S33 CSR-IV HUB3
Member
LDC96S31 CSR-IV HUB4
Member/Nonmember
LDC96T10 Message Understanding Conference (MUC) 6 Additional News Text
Member/Nonmember
1995
LDC95T6 CSR-III Text
Member
LDC95T11 European Language Newspaper Text
Member
LDC95T8 Japanese Business News Text
Member
LDC95S28 LATINO-40 Spanish Read News
Member/Nonmember
LDC95T13 Mandarin Chinese News Text
Member/Nonmember
LDC95T21 North American News Text Corpus
Member
LDC95T9 Spanish News Text
Member
1994
LDC94T5 ECI Multilingual Text
Member/Nonmember
LDC94T4A UN Parallel Text (Complete)
LDC94T4B-1 UN Parallel Text (English)
LDC94T4B-2 UN Parallel Text (French)
LDC94T4B-3 UN Parallel Text (Spanish)
Member/Nonmember
1993
LDC93T1 ACL/DCI
Member/Nonmember
LDC93T3A TIPSTER Complete
LDC93T3B TIPSTER Volume 1
LDC93T3C TIPSTER Volume 2
LDC93T3D TIPSTER Volume 3
Member/Nonmember