|
|
LDC Catalog |
By Type and Source |
By Year |
Top Ten |
Projects |
Catalog Search
|
| |
|
|

|
|
LDC Projects
|
Many of the corpora available from the LDC have been used in
multi-site, multi-year research projects, with benchmark tests carried
out under government sponsorship. Typically, a test protocol is
defined by the Natural Language
Processing Group at NIST, and methods and findings are reported by
researchers, based on data provided by the LDC for both training and
testing of language-based systems. Below is a list of the projects
that have used LDC data.
|
|
|
| |
ACE
|
| |
|
| |
|
| |
|
LDC2011T02
|
ACE 2005 English SpatialML Annotations Version 2
|
|
| |
|
LDC2010T09
|
ACE 2005 Mandarin SpatialML Annotations
|
|
| |
|
| |
|
LDC2010T18
|
ACE Time Normalization (TERN) 2004 English Evaluation Data V1.0
|
|
| |
|
LDC2005T07
|
ACE Time Normalization (TERN) 2004 English Training Data v 1.0
|
|
| |
|
| |
|
LDC2005T33
|
BBN Pronoun Coreference and Entity Type Corpus
|
|
| |
|
LDC2011T08
|
Datasets for Generic Relation Extraction (reACE)
|
|
| |
|
| |
|
LDC2009T11
|
REFLEX Entity Translation Training/DevTest
|
|
| |
|
LDC2004T09
|
TIDES Extraction (ACE) 2003 Multilingual Training Data
|
|
| |
American National Corpus (ANC)
|
| |
|
LDC2010T22
|
Manually Annotated Sub-Corpus First Release
|
|
| |
AQUAINT
|
| |
|
LDC2008T25
|
AQUAINT-2 Information-Retrieval Text Research Collection
|
|
| |
|
LDC2005T33
|
BBN Pronoun Coreference and Entity Type Corpus
|
|
| |
ATIS
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
BEST
|
| |
BOLT
|
| |
Communicator
|
| |
|
| |
|
| |
|
| |
|
| |
CoNLL
|
| |
|
| |
|
| |
CoNNL
|
| |
DARPA-CSR
|
| |
|
LDC2005S08
|
BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
DASL
|
| |
|
LDC2003T15
|
SLX Corpus of Classic Sociolinguistic Interviews
|
|
| |
DEFT
|
| |
DOE/IRS2008-0256
|
| |
DUC
|
| |
EARS
|
| |
|
LDC97S66
|
1996 English Broadcast News Dev and Eval (HUB4)
|
|
| |
|
LDC97S44
|
1996 English Broadcast News Speech (HUB4)
|
|
| |
|
LDC97T22
|
1996 English Broadcast News Transcripts (HUB4)
|
|
| |
|
LDC98S71
|
1997 English Broadcast News Speech (HUB4)
|
|
| |
|
LDC98T28
|
1997 English Broadcast News Transcripts (HUB4)
|
|
| |
|
LDC2001S91
|
1997 HUB4 Broadcast News Evaluation Non-English Test Material
|
|
| |
|
LDC2002S11
|
1997 HUB4 English Evaluation Speech and Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC98S73
|
1997 Mandarin Broadcast News Speech (HUB4-NE)
|
|
| |
|
LDC98T24
|
1997 Mandarin Broadcast News Transcripts (HUB4-NE)
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2004S11
|
2002 Rich Transcription Broadcast News and Conversational Telephone Speech
|
|
| |
|
LDC99L23
|
American English Spoken Lexicon
|
|
| |
|
LDC2005S07
|
Arabic CTS Levantine Fisher Training Data Set 3, Speech
|
|
| |
|
LDC2005T03
|
Arabic CTS Levantine Fisher Training Data Set 3, Transcripts
|
|
| |
|
| |
|
| |
|
LDC2005S08
|
BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts
|
|
| |
|
LDC96S46
|
CALLFRIEND American English-Non-Southern Dialect
|
|
| |
|
LDC96S47
|
CALLFRIEND American English-Southern Dialect
|
|
| |
|
| |
|
LDC96S55
|
CALLFRIEND Mandarin Chinese-Mainland Dialect
|
|
| |
|
LDC96S56
|
CALLFRIEND Mandarin Chinese-Taiwan Dialect
|
|
| |
|
LDC97L20
|
CALLHOME American English Lexicon (PRONLEX)
|
|
| |
|
LDC97S42
|
CALLHOME American English Speech
|
|
| |
|
LDC97T14
|
CALLHOME American English Transcripts
|
|
| |
|
LDC97S45
|
CALLHOME Egyptian Arabic Speech
|
|
| |
|
LDC2002S37
|
CALLHOME Egyptian Arabic Speech Supplement
|
|
| |
|
LDC97T19
|
CALLHOME Egyptian Arabic Transcripts
|
|
| |
|
LDC2002T38
|
CALLHOME Egyptian Arabic Transcripts Supplement
|
|
| |
|
LDC96L15
|
CALLHOME Mandarin Chinese Lexicon
|
|
| |
|
LDC96S34
|
CALLHOME Mandarin Chinese Speech
|
|
| |
|
LDC96T16
|
CALLHOME Mandarin Chinese Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
LDC99L22
|
Egyptian Colloquial Arabic Lexicon
|
|
| |
|
| |
|
| |
|
| |
|
LDC2005T19
|
Fisher English Training Part 2, Transcripts
|
|
| |
|
LDC2004S13
|
Fisher English Training Speech Part 1 Speech
|
|
| |
|
LDC2004T19
|
Fisher English Training Speech Part 1 Transcripts
|
|
| |
|
LDC2005S15
|
HKUST Mandarin Telephone Speech, Part 1
|
|
| |
|
LDC2005T32
|
HKUST Mandarin Telephone Transcript Data, Part 1
|
|
| |
|
LDC98S69
|
HUB5 Mandarin Telephone Speech Corpus
|
|
| |
|
| |
|
LDC2005S14
|
Levantine Arabic QT Training Data Set 4 (Speech + Transcripts)
|
|
| |
|
LDC2006S29
|
Levantine Arabic QT Training Data Set 5, Speech
|
|
| |
|
LDC2006T07
|
Levantine Arabic QT Training Data Set 5, Transcripts
|
|
| |
|
| |
|
LDC95T21
|
North American News Text Corpus
|
|
| |
|
LDC98T30
|
North American News Text Supplement
|
|
| |
|
| |
|
LDC2004T12
|
RT-03 MDE Training Data Text and Annotations
|
|
| |
|
| |
|
LDC2005T24
|
RT-04 MDE Training Data Text/Annotations
|
|
| |
|
LDC2004S10
|
Santa Barbara Corpus of Spoken American English Part III
|
|
| |
|
LDC2005S25
|
Santa Barbara Corpus of Spoken American English Part IV
|
|
| |
|
| |
|
| |
|
| |
|
LDC2001S15
|
Switchboard Cellular Part 1 Transcribed Audio
|
|
| |
|
LDC2001T14
|
Switchboard Cellular Part 1 Transcription
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC98S72
|
Taiwanese Putonghua Speech and Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
GALE
|
| |
|
LDC97S66
|
1996 English Broadcast News Dev and Eval (HUB4)
|
|
| |
|
LDC97S44
|
1996 English Broadcast News Speech (HUB4)
|
|
| |
|
LDC97T22
|
1996 English Broadcast News Transcripts (HUB4)
|
|
| |
|
LDC98S71
|
1997 English Broadcast News Speech (HUB4)
|
|
| |
|
LDC98T28
|
1997 English Broadcast News Transcripts (HUB4)
|
|
| |
|
LDC2001S91
|
1997 HUB4 Broadcast News Evaluation Non-English Test Material
|
|
| |
|
LDC2002S11
|
1997 HUB4 English Evaluation Speech and Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC98S73
|
1997 Mandarin Broadcast News Speech (HUB4-NE)
|
|
| |
|
LDC98T24
|
1997 Mandarin Broadcast News Transcripts (HUB4-NE)
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2004S11
|
2002 Rich Transcription Broadcast News and Conversational Telephone Speech
|
|
| |
|
LDC2009T05
|
2008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data
|
|
| |
|
LDC2011T05
|
2008/2010 NIST Metrics for Machine Translation (MetricsMaTr) GALE Evaluation Set
|
|
| |
|
| |
|
LDC2005T07
|
ACE Time Normalization (TERN) 2004 English Training Data v 1.0
|
|
| |
|
| |
|
| |
|
LDC99L23
|
American English Spoken Lexicon
|
|
| |
|
| |
|
LDC2005S07
|
Arabic CTS Levantine Fisher Training Data Set 3, Speech
|
|
| |
|
LDC2005T03
|
Arabic CTS Levantine Fisher Training Data Set 3, Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2003T07
|
Arabic Treebank: Part 1 - 10K-word English Translation
|
|
| |
|
| |
|
LDC2005T02
|
Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic analysis)
|
|
| |
|
| |
|
| |
|
| |
|
LDC2005T20
|
Arabic Treebank: Part 3 (full corpus) v 2.0 (MPG + Syntactic Analysis)
|
|
| |
|
| |
|
| |
|
LDC2005T33
|
BBN Pronoun Coreference and Entity Type Corpus
|
|
| |
|
LDC2005S08
|
BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts
|
|
| |
|
| |
|
LDC2002L49
|
Buckwalter Arabic Morphological Analyzer Version 1.0
|
|
| |
|
LDC2004L02
|
Buckwalter Arabic Morphological Analyzer Version 2.0
|
|
| |
|
LDC96S46
|
CALLFRIEND American English-Non-Southern Dialect
|
|
| |
|
LDC96S47
|
CALLFRIEND American English-Southern Dialect
|
|
| |
|
| |
|
LDC96S55
|
CALLFRIEND Mandarin Chinese-Mainland Dialect
|
|
| |
|
LDC96S56
|
CALLFRIEND Mandarin Chinese-Taiwan Dialect
|
|
| |
|
LDC97L20
|
CALLHOME American English Lexicon (PRONLEX)
|
|
| |
|
LDC97S42
|
CALLHOME American English Speech
|
|
| |
|
LDC97T14
|
CALLHOME American English Transcripts
|
|
| |
|
LDC97S45
|
CALLHOME Egyptian Arabic Speech
|
|
| |
|
LDC2002S37
|
CALLHOME Egyptian Arabic Speech Supplement
|
|
| |
|
LDC97T19
|
CALLHOME Egyptian Arabic Transcripts
|
|
| |
|
LDC2002T38
|
CALLHOME Egyptian Arabic Transcripts Supplement
|
|
| |
|
LDC96L15
|
CALLHOME Mandarin Chinese Lexicon
|
|
| |
|
LDC96S34
|
CALLHOME Mandarin Chinese Speech
|
|
| |
|
LDC96T16
|
CALLHOME Mandarin Chinese Transcripts
|
|
| |
|
| |
|
| |
|
LDC2005T10
|
Chinese English News Magazine Parallel Text
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2002L27
|
Chinese-English Translation Lexicon Version 3.0
|
|
| |
|
| |
|
LDC99L22
|
Egyptian Colloquial Arabic Lexicon
|
|
| |
|
LDC2009T01
|
English CTS Treebank with Structural Metadata
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2012T02
|
English Translation Treebank: An-Nahar Newswire
|
|
| |
|
| |
|
| |
|
LDC95T11
|
European Language Newspaper Text
|
|
| |
|
| |
|
LDC2005T19
|
Fisher English Training Part 2, Transcripts
|
|
| |
|
LDC2004S13
|
Fisher English Training Speech Part 1 Speech
|
|
| |
|
LDC2004T19
|
Fisher English Training Speech Part 1 Transcripts
|
|
| |
|
LDC2007S02
|
Fisher Levantine Arabic Conversational Telephone Speech
|
|
| |
|
LDC2007T04
|
Fisher Levantine Arabic Conversational Telephone Speech, Transcripts
|
|
| |
|
LDC2013T10
|
GALE Arabic-English Parallel Aligned Treebank -- Newswire
|
|
| |
|
LDC2012T16
|
GALE Chinese-English Word Alignment and Tagging Training Part 1 -- Newswire and Web
|
|
| |
|
LDC2012T20
|
GALE Chinese-English Word Alignment and Tagging Training Part 2 -- Newswire
|
|
| |
|
LDC2012T24
|
GALE Chinese-English Word Alignment and Tagging Training Part 3 -- Web
|
|
| |
|
LDC2013T05
|
GALE Chinese-English Word Alignment and Tagging Training Part 4 -- Web
|
|
| |
|
| |
|
LDC2007T24
|
GALE Phase 1 Arabic Broadcast News Parallel Text - Part 1
|
|
| |
|
LDC2008T09
|
GALE Phase 1 Arabic Broadcast News Parallel Text - Part 2
|
|
| |
|
LDC2009T03
|
GALE Phase 1 Arabic Newsgroup Parallel Text - Part 1
|
|
| |
|
LDC2009T09
|
GALE Phase 1 Arabic Newsgroup Parallel Text - Part 2
|
|
| |
|
LDC2008T06
|
GALE Phase 1 Chinese Blog Parallel Text
|
|
| |
|
LDC2009T02
|
GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 1
|
|
| |
|
LDC2009T06
|
GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 2
|
|
| |
|
LDC2007T23
|
GALE Phase 1 Chinese Broadcast News Parallel Text - Part 1
|
|
| |
|
LDC2008T08
|
GALE Phase 1 Chinese Broadcast News Parallel Text - Part 2
|
|
| |
|
LDC2008T18
|
GALE Phase 1 Chinese Broadcast News Parallel Text - Part 3
|
|
| |
|
LDC2009T15
|
GALE Phase 1 Chinese Newsgroup Parallel Text - Part 1
|
|
| |
|
LDC2010T03
|
GALE Phase 1 Chinese Newsgroup Parallel Text - Part 2
|
|
| |
|
| |
|
LDC2012T06
|
GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 1
|
|
| |
|
LDC2012T14
|
GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 2
|
|
| |
|
LDC2013S02
|
GALE Phase 2 Arabic Broadcast Conversation Speech Part 1
|
|
| |
|
LDC2013T04
|
GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 1
|
|
| |
|
LDC2012T18
|
GALE Phase 2 Arabic Broadcast News Parallel Text
|
|
| |
|
LDC2012T17
|
GALE Phase 2 Arabic Newswire Parallel Text
|
|
| |
|
| |
|
LDC2013S04
|
GALE Phase 2 Chinese Broadcast Conversation Speech
|
|
| |
|
LDC2013T08
|
GALE Phase 2 Chinese Broadcast Conversation Transcripts
|
|
| |
|
LDC2005S15
|
HKUST Mandarin Telephone Speech, Part 1
|
|
| |
|
LDC2005T32
|
HKUST Mandarin Telephone Transcript Data, Part 1
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC98S69
|
HUB5 Mandarin Telephone Speech Corpus
|
|
| |
|
| |
|
LDC95T8
|
Japanese Business News Text
|
|
| |
|
LDC99T34
|
Japanese Business News Text Supplement
|
|
| |
|
| |
|
LDC2005S14
|
Levantine Arabic QT Training Data Set 4 (Speech + Transcripts)
|
|
| |
|
| |
|
LDC2001T02
|
Message Understanding Conference (MUC) 7
|
|
| |
|
LDC2003T18
|
Multiple-Translation Arabic (MTA) Part 1
|
|
| |
|
LDC2005T05
|
Multiple-Translation Arabic (MTA) Part 2
|
|
| |
|
LDC2003T17
|
Multiple-Translation Chinese (MTC) Part 2
|
|
| |
|
LDC2004T07
|
Multiple-Translation Chinese (MTC) Part 3
|
|
| |
|
| |
|
LDC2010T21
|
NIST 2008 Open Machine Translation (OpenMT) Evaluation
|
|
| |
|
LDC2010T01
|
NIST Open MT 2008 Evaluation (MT08) Selected References and System Translations
|
|
| |
|
LDC95T21
|
North American News Text Corpus
|
|
| |
|
LDC98T30
|
North American News Text Supplement
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2004T12
|
RT-03 MDE Training Data Text and Annotations
|
|
| |
|
| |
|
LDC2005T24
|
RT-04 MDE Training Data Text/Annotations
|
|
| |
|
LDC2004S10
|
Santa Barbara Corpus of Spoken American English Part III
|
|
| |
|
LDC2005S25
|
Santa Barbara Corpus of Spoken American English Part IV
|
|
| |
|
| |
|
| |
|
| |
|
LDC99T41
|
Spanish Newswire Text, Volume 2
|
|
| |
|
| |
|
LDC2001S15
|
Switchboard Cellular Part 1 Transcribed Audio
|
|
| |
|
LDC2001T14
|
Switchboard Cellular Part 1 Transcription
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC98S72
|
Taiwanese Putonghua Speech and Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2005S11
|
TDT4 Multilingual Broadcast News Speech Corpus
|
|
| |
|
| |
|
LDC2004T09
|
TIDES Extraction (ACE) 2003 Multilingual Training Data
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
GENOA
|
| |
|
| |
|
| |
HARD
|
| |
HAVIC
|
| |
Hub4
|
| |
|
| |
|
LDC97S66
|
1996 English Broadcast News Dev and Eval (HUB4)
|
|
| |
|
LDC97S44
|
1996 English Broadcast News Speech (HUB4)
|
|
| |
|
LDC97T22
|
1996 English Broadcast News Transcripts (HUB4)
|
|
| |
|
LDC98S71
|
1997 English Broadcast News Speech (HUB4)
|
|
| |
|
LDC98T28
|
1997 English Broadcast News Transcripts (HUB4)
|
|
| |
|
LDC2001S91
|
1997 HUB4 Broadcast News Evaluation Non-English Test Material
|
|
| |
|
LDC2002S11
|
1997 HUB4 English Evaluation Speech and Transcripts
|
|
| |
|
LDC98S73
|
1997 Mandarin Broadcast News Speech (HUB4-NE)
|
|
| |
|
LDC98T24
|
1997 Mandarin Broadcast News Transcripts (HUB4-NE)
|
|
| |
|
LDC98S74
|
1997 Spanish Broadcast News Speech (HUB4-NE)
|
|
| |
|
LDC98T29
|
1997 Spanish Broadcast News Transcripts (HUB4-NE)
|
|
| |
|
LDC2000S86
|
1998 HUB4 Broadcast News Evaluation English Test Material
|
|
| |
|
LDC95T21
|
North American News Text Corpus
|
|
| |
|
LDC98T30
|
North American News Text Supplement
|
|
| |
Hub5-LVCSR
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2002T43
|
2000 HUB5 English Evaluation Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
LDC97L20
|
CALLHOME American English Lexicon (PRONLEX)
|
|
| |
|
LDC97S42
|
CALLHOME American English Speech
|
|
| |
|
LDC97T14
|
CALLHOME American English Transcripts
|
|
| |
|
LDC97S45
|
CALLHOME Egyptian Arabic Speech
|
|
| |
|
LDC2002S37
|
CALLHOME Egyptian Arabic Speech Supplement
|
|
| |
|
LDC97T19
|
CALLHOME Egyptian Arabic Transcripts
|
|
| |
|
LDC2002T38
|
CALLHOME Egyptian Arabic Transcripts Supplement
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC96L15
|
CALLHOME Mandarin Chinese Lexicon
|
|
| |
|
LDC96S34
|
CALLHOME Mandarin Chinese Speech
|
|
| |
|
LDC96T16
|
CALLHOME Mandarin Chinese Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
LDC99L22
|
Egyptian Colloquial Arabic Lexicon
|
|
| |
|
LDC98S69
|
HUB5 Mandarin Telephone Speech Corpus
|
|
| |
|
| |
|
LDC98S70
|
HUB5 Spanish Telephone Speech Corpus
|
|
| |
|
| |
|
| |
|
| |
JANUS
|
| |
|
| |
|
| |
LCTL
|
| |
LID
|
| |
|
LDC96S46
|
CALLFRIEND American English-Non-Southern Dialect
|
|
| |
|
LDC96S47
|
CALLFRIEND American English-Southern Dialect
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC96S55
|
CALLFRIEND Mandarin Chinese-Mainland Dialect
|
|
| |
|
LDC96S56
|
CALLFRIEND Mandarin Chinese-Taiwan Dialect
|
|
| |
|
LDC96S57
|
CALLFRIEND Spanish-Caribbean Dialect
|
|
| |
|
LDC96S58
|
CALLFRIEND Spanish-Non-Caribbean Dialect
|
|
| |
|
| |
|
| |
Linguistic Atlas Project
|
| |
|
| |
Machine Reading
|
| |
MADCAT
|
| |
|
| |
|
| |
MALACH
|
| |
|
LDC2012S05
|
USC-SFI MALACH Interviews and Transcripts English
|
|
| |
MDE
|
| |
MED
|
| |
MED-11
|
| |
MIXER 8
|
| |
MIXER-7
|
| |
MSE
|
| |
MT-06
|
| |
MT08
|
| |
|
LDC2010T01
|
NIST Open MT 2008 Evaluation (MT08) Selected References and System Translations
|
|
| |
MT09
|
| |
MT12
|
| |
MUC
|
| |
|
LDC2003T13
|
Message Understanding Conference (MUC) 6
|
|
| |
|
LDC96T10
|
Message Understanding Conference (MUC) 6 Additional News Text
|
|
| |
|
LDC2001T02
|
Message Understanding Conference (MUC) 7
|
|
| |
|
LDC2010T15
|
Message Understanding Conference 7 Timed (MUC7_T)
|
|
| |
|
LDC95T21
|
North American News Text Corpus
|
|
| |
|
| |
|
| |
|
| |
|
| |
NIST Automatic Meeting Recognition
|
| |
|
| |
|
LDC2004T13
|
NIST Meeting Pilot Corpus Transcripts and Metadata
|
|
| |
NIST LRE
|
| |
|
LDC2008S05
|
2005 NIST Language Recognition Evaluation
|
|
| |
|
LDC2009S05
|
2007 NIST Language Recognition Evaluation Supplemental Training Set
|
|
| |
|
LDC2009S04
|
2007 NIST Language Recognition Evaluation Test Set
|
|
| |
NIST MT
|
| |
|
LDC2009T05
|
2008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data
|
|
| |
|
LDC2010T10
|
NIST 2002 Open Machine Translation (OpenMT) Evaluation
|
|
| |
|
LDC2010T11
|
NIST 2003 Open Machine Translation (OpenMT) Evaluation
|
|
| |
|
LDC2010T12
|
NIST 2004 Open Machine Translation (OpenMT) Evaluation
|
|
| |
|
LDC2010T14
|
NIST 2005 Open Machine Translation (OpenMT) Evaluation
|
|
| |
|
LDC2010T17
|
NIST 2006 Open Machine Translation (OpenMT) Evaluation
|
|
| |
|
LDC2010T21
|
NIST 2008 Open Machine Translation (OpenMT) Evaluation
|
|
| |
|
LDC2013T07
|
NIST 2008-2012 Open Machine Translation (OpenMT) Progress Test Sets
|
|
| |
|
LDC2010T23
|
NIST 2009 Open Machine Translation (OpenMT) Evaluation
|
|
| |
|
LDC2013T03
|
NIST 2012 Open Machine Translation (OpenMT) Evaluation
|
|
| |
NIST SRE
|
| |
|
LDC2001S97
|
2000 NIST Speaker Recognition Evaluation
|
|
| |
|
LDC2010S03
|
2003 NIST Speaker Recognition Evaluation
|
|
| |
|
LDC2006S44
|
2004 NIST Speaker Recognition Evaluation
|
|
| |
|
LDC2011S04
|
2005 NIST Speaker Recognition Evaluation Test Data
|
|
| |
|
LDC2011S01
|
2005 NIST Speaker Recognition Evaluation Training Data
|
|
| |
|
LDC2011S10
|
2006 NIST Speaker Recognition Evaluation Test Set Part 1
|
|
| |
|
LDC2012S01
|
2006 NIST Speaker Recognition Evaluation Test Set Part 2
|
|
| |
|
LDC2011S09
|
2006 NIST Speaker Recognition Evaluation Training Set
|
|
| |
|
LDC2011S11
|
2008 NIST Speaker Recognition Evaluation Supplemental Set
|
|
| |
|
LDC2011S08
|
2008 NIST Speaker Recognition Evaluation Test Set
|
|
| |
|
LDC2011S05
|
2008 NIST Speaker Recognition Evaluation Training Set Part 1
|
|
| |
|
LDC2011S07
|
2008 NIST Speaker Recognition Evaluation Training Set Part 2
|
|
| |
NTCIR
|
| |
OntoNotes
|
| |
|
| |
OpenHaRT
|
| |
PHANOTICS
|
| |
RATS
|
| |
Reflex-LCTL
|
| |
REFLEX-MTE
|
| |
|
LDC2009T11
|
REFLEX Entity Translation Training/DevTest
|
|
| |
RM
|
| |
|
LDC96S39
|
RM Isolated and Spelled Word Data
|
|
| |
ROAR
|
| |
|
| |
|
| |
RT
|
| |
|
LDC2007S12
|
2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data
|
|
| |
|
LDC2007S11
|
2004 Spring NIST Rich Transcription (RT-04S) Development Data
|
|
| |
|
LDC2011S06
|
2005 Spring NIST Rich Transcription (RT-05S) Evaluation Set
|
|
| |
SCIL
|
| |
SemEval
|
| |
|
LDC2011T01
|
SemEval-2010 Task 1 OntoNotes English: Coreference Resolution in Multiple Languages
|
|
| |
SID
|
| |
|
LDC96S61
|
1996 Speaker Recognition Benchmark
|
|
| |
|
LDC99S80
|
1997 Speaker Recognition Benchmark
|
|
| |
|
LDC98S76
|
1998 Speaker Recognition Benchmark
|
|
| |
|
LDC99S81
|
1999 Speaker Recognition Benchmark
|
|
| |
|
LDC2002S34
|
2001 NIST Speaker Recognition Evaluation Corpus
|
|
| |
|
LDC2004S04
|
2002 NIST Speaker Recognition Evaluation
|
|
| |
|
| |
|
LDC2001S15
|
Switchboard Cellular Part 1 Transcribed Audio
|
|
| |
|
LDC2001T14
|
Switchboard Cellular Part 1 Transcription
|
|
| |
|
| |
|
| |
|
| |
|
| |
SIGHAN
|
| |
SPINE
|
| |
|
LDC2000S96
|
Speech in Noisy Environments (SPINE) Evaluation Audio
|
|
| |
|
LDC2000T54
|
Speech in Noisy Environments (SPINE) Evaluation Transcripts
|
|
| |
|
LDC2000S87
|
Speech in Noisy Environments (SPINE) Training Audio
|
|
| |
|
LDC2000T49
|
Speech in Noisy Environments (SPINE) Training Transcripts
|
|
| |
|
LDC2001S04
|
Speech in Noisy Environments (SPINE2) Part 1 Audio
|
|
| |
|
LDC2001T05
|
Speech in Noisy Environments (SPINE2) Part 1 Transcripts
|
|
| |
|
LDC2001S06
|
Speech in Noisy Environments (SPINE2) Part 2 Audio
|
|
| |
|
LDC2001T07
|
Speech in Noisy Environments (SPINE2) Part 2 Transcripts
|
|
| |
|
LDC2001S08
|
Speech in Noisy Environments (SPINE2) Part 3 Audio
|
|
| |
|
LDC2001T09
|
Speech in Noisy Environments (SPINE2) Part 3 Transcripts
|
|
| |
|
LDC2001S99
|
Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio
|
|
| |
SRE-12
|
| |
STD
|
| |
TAC
|
| |
Talkbank
|
| |
|
LDC2005T35
|
American National Corpus (ANC) Second Release
|
|
| |
|
| |
|
| |
|
LDC2003L01
|
Grassfields Bantu Fieldwork: Dschang Lexicon
|
|
| |
|
LDC2003S02
|
Grassfields Bantu Fieldwork: Dschang Tone Paradigms
|
|
| |
|
LDC2001S16
|
Grassfields Bantu Fieldwork: Ngomba Tone Paradigms
|
|
| |
|
LDC2004L01
|
Klex: Finite-State Lexical Transducer for Korean
|
|
| |
|
| |
|
LDC2003S06
|
Santa Barbara Corpus of Spoken American English Part II
|
|
| |
|
LDC2004S10
|
Santa Barbara Corpus of Spoken American English Part III
|
|
| |
|
LDC2005S25
|
Santa Barbara Corpus of Spoken American English Part IV
|
|
| |
|
LDC2003T15
|
SLX Corpus of Classic Sociolinguistic Interviews
|
|
| |
|
LDC2004S12
|
TalkBank Ethology Data: Field Recordings of Vervet Monkey Calls
|
|
| |
TDT
|
| |
|
LDC2010T18
|
ACE Time Normalization (TERN) 2004 English Evaluation Data V1.0
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2005S11
|
TDT4 Multilingual Broadcast News Speech Corpus
|
|
| |
|
| |
|
| |
|
| |
TERN
|
| |
|
LDC2010T18
|
ACE Time Normalization (TERN) 2004 English Evaluation Data V1.0
|
|
| |
TIDES
|
| |
|
| |
|
LDC2010T18
|
ACE Time Normalization (TERN) 2004 English Evaluation Data V1.0
|
|
| |
|
LDC2005T07
|
ACE Time Normalization (TERN) 2004 English Training Data v 1.0
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2003T07
|
Arabic Treebank: Part 1 - 10K-word English Translation
|
|
| |
|
| |
|
LDC2005T02
|
Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic analysis)
|
|
| |
|
| |
|
LDC2005T20
|
Arabic Treebank: Part 3 (full corpus) v 2.0 (MPG + Syntactic Analysis)
|
|
| |
|
| |
|
LDC2005T33
|
BBN Pronoun Coreference and Entity Type Corpus
|
|
| |
|
| |
|
LDC2002L49
|
Buckwalter Arabic Morphological Analyzer Version 1.0
|
|
| |
|
LDC2004L02
|
Buckwalter Arabic Morphological Analyzer Version 2.0
|
|
| |
|
| |
|
| |
|
LDC2005T10
|
Chinese English News Magazine Parallel Text
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2002L27
|
Chinese-English Translation Lexicon Version 3.0
|
|
| |
|
LDC2007T02
|
English Chinese Translation Treebank v 1.0
|
|
| |
|
| |
|
| |
|
LDC95T11
|
European Language Newspaper Text
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC95T8
|
Japanese Business News Text
|
|
| |
|
LDC99T34
|
Japanese Business News Text Supplement
|
|
| |
|
| |
|
| |
|
LDC2001T02
|
Message Understanding Conference (MUC) 7
|
|
| |
|
LDC2003T18
|
Multiple-Translation Arabic (MTA) Part 1
|
|
| |
|
LDC2005T05
|
Multiple-Translation Arabic (MTA) Part 2
|
|
| |
|
LDC2003T17
|
Multiple-Translation Chinese (MTC) Part 2
|
|
| |
|
LDC2004T07
|
Multiple-Translation Chinese (MTC) Part 3
|
|
| |
|
LDC2006T04
|
Multiple-Translation Chinese (MTC) Part 4
|
|
| |
|
| |
|
LDC95T21
|
North American News Text Corpus
|
|
| |
|
LDC98T30
|
North American News Text Supplement
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC99T41
|
Spanish Newswire Text, Volume 2
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2005S11
|
TDT4 Multilingual Broadcast News Speech Corpus
|
|
| |
|
| |
|
LDC2004T09
|
TIDES Extraction (ACE) 2003 Multilingual Training Data
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
Tipster
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
Transtac
|
| |
TREC
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
TRECVid
|
| |
VACE
|
| |
|
LDC2012V01
|
2005 NIST/USF Evaluation Resources for the VACE Program - Broadcast News
|
|
| |
|
LDC2011V05
|
2006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1
|
|
| |
|
LDC2011V06
|
2006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 2
|
|
| |
|
LDC2011V03
|
NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1
|
|
| |
|
LDC2011V04
|
NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 2
|
|
| |
|
LDC2011V01
|
NIST/USF Evaluation Resources for the VACE Program - Meeting Data Training Set Part 1
|
|
| |
|
LDC2011V02
|
NIST/USF Evaluation Resources for the VACE Program - Meeting Data Training Set Part 2
|
|
| |
ViperToxin
|
|
|

About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data
Contact: ldc@ldc.upenn.edu
(c) 1992-2010 Linguistic Data Consortium,
University of Pennsylvania. All Rights
Reserved.
|