|
|
LDC Catalog |
By Type and Source |
By Year |
Top Ten |
Projects |
Catalog Search
|
| |
|
|

|
|
LDC Projects
|
Many of the corpora available from the LDC have been used in
multi-site, multi-year research projects, with benchmark tests carried
out under government sponsorship. Typically, a test protocol is
defined by the Natural Language
Processing Group at NIST, and methods and findings are reported by
researchers, based on data provided by the LDC for both training and
testing of language-based systems. Below is a list of the projects
that have used LDC data.
|
|
|
| |
ACE
|
| |
|
| |
|
| |
|
LDC2005T07
|
ACE Time Normalization (TERN) 2004 English Training Data v 1.0
|
|
| |
|
| |
|
LDC2005T33
|
BBN Pronoun Coreference and Entity Type Corpus
|
|
| |
|
| |
|
LDC2009T11
|
REFLEX Entity Translation Training/DevTest
|
|
| |
|
LDC2004T09
|
TIDES Extraction (ACE) 2003 Multilingual Training Data
|
|
| |
AQUAINT
|
| |
|
LDC2008T25
|
AQUAINT-2 Information-Retrieval Text Research Collection
|
|
| |
|
LDC2005T33
|
BBN Pronoun Coreference and Entity Type Corpus
|
|
| |
ATIS
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
Communicator
|
| |
|
| |
|
| |
|
| |
CoNLL
|
| |
CoNNL
|
| |
DARPA-CSR
|
| |
|
LDC2005S08
|
BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
DASL
|
| |
|
LDC2003T15
|
SLX Corpus of Classic Sociolinguistic Interviews
|
|
| |
DUC
|
| |
EARS
|
| |
|
LDC97S66
|
1996 English Broadcast News Dev and Eval (HUB4)
|
|
| |
|
LDC97S44
|
1996 English Broadcast News Speech (HUB4)
|
|
| |
|
LDC97T22
|
1996 English Broadcast News Transcripts (HUB4)
|
|
| |
|
LDC98S71
|
1997 English Broadcast News Speech (HUB4)
|
|
| |
|
LDC98T28
|
1997 English Broadcast News Transcripts (HUB4)
|
|
| |
|
LDC2001S91
|
1997 HUB4 Broadcast News Evaluation Non-English Test Material
|
|
| |
|
LDC2002S11
|
1997 HUB4 English Evaluation Speech and Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC98S73
|
1997 Mandarin Broadcast News Speech (HUB4-NE)
|
|
| |
|
LDC98T24
|
1997 Mandarin Broadcast News Transcripts (HUB4-NE)
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2004S11
|
2002 Rich Transcription Broadcast News and Conversational Telephone Speech
|
|
| |
|
LDC99L23
|
American English Spoken Lexicon
|
|
| |
|
LDC2005S07
|
Arabic CTS Levantine Fisher Training Data Set 3, Speech
|
|
| |
|
LDC2005T03
|
Arabic CTS Levantine Fisher Training Data Set 3, Transcripts
|
|
| |
|
| |
|
| |
|
LDC2005S08
|
BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts
|
|
| |
|
LDC96S46
|
CALLFRIEND American English-Non-Southern Dialect
|
|
| |
|
LDC96S47
|
CALLFRIEND American English-Southern Dialect
|
|
| |
|
| |
|
LDC96S55
|
CALLFRIEND Mandarin Chinese-Mainland Dialect
|
|
| |
|
LDC96S56
|
CALLFRIEND Mandarin Chinese-Taiwan Dialect
|
|
| |
|
LDC97L20
|
CALLHOME American English Lexicon (PRONLEX)
|
|
| |
|
LDC97S42
|
CALLHOME American English Speech
|
|
| |
|
LDC97T14
|
CALLHOME American English Transcripts
|
|
| |
|
LDC97S45
|
CALLHOME Egyptian Arabic Speech
|
|
| |
|
LDC2002S37
|
CALLHOME Egyptian Arabic Speech Supplement
|
|
| |
|
LDC97T19
|
CALLHOME Egyptian Arabic Transcripts
|
|
| |
|
LDC2002T38
|
CALLHOME Egyptian Arabic Transcripts Supplement
|
|
| |
|
LDC96L15
|
CALLHOME Mandarin Chinese Lexicon
|
|
| |
|
LDC96S34
|
CALLHOME Mandarin Chinese Speech
|
|
| |
|
LDC96T16
|
CALLHOME Mandarin Chinese Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
LDC99L22
|
Egyptian Colloquial Arabic Lexicon
|
|
| |
|
| |
|
| |
|
| |
|
LDC2005T19
|
Fisher English Training Part 2, Transcripts
|
|
| |
|
LDC2004S13
|
Fisher English Training Speech Part 1 Speech
|
|
| |
|
LDC2004T19
|
Fisher English Training Speech Part 1 Transcripts
|
|
| |
|
LDC2005S15
|
HKUST Mandarin Telephone Speech, Part 1
|
|
| |
|
LDC2005T32
|
HKUST Mandarin Telephone Transcript Data, Part 1
|
|
| |
|
LDC98S69
|
HUB5 Mandarin Telephone Speech Corpus
|
|
| |
|
| |
|
LDC2005S14
|
Levantine Arabic QT Training Data Set 4 (Speech + Transcripts)
|
|
| |
|
LDC2006S29
|
Levantine Arabic QT Training Data Set 5, Speech
|
|
| |
|
LDC2006T07
|
Levantine Arabic QT Training Data Set 5, Transcripts
|
|
| |
|
| |
|
LDC95T21
|
North American News Text Corpus
|
|
| |
|
LDC98T30
|
North American News Text Supplement
|
|
| |
|
| |
|
LDC2004T12
|
RT-03 MDE Training Data Text and Annotations
|
|
| |
|
| |
|
LDC2005T24
|
RT-04 MDE Training Data Text/Annotations
|
|
| |
|
LDC2004S10
|
Santa Barbara Corpus of Spoken American English Part III
|
|
| |
|
LDC2005S25
|
Santa Barbara Corpus of Spoken American English Part IV
|
|
| |
|
| |
|
| |
|
| |
|
LDC2001S15
|
Switchboard Cellular Part 1 Transcribed Audio
|
|
| |
|
LDC2001T14
|
Switchboard Cellular Part 1 Transcription
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC98S72
|
Taiwanese Putonghua Speech and Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
GALE
|
| |
|
LDC97S66
|
1996 English Broadcast News Dev and Eval (HUB4)
|
|
| |
|
LDC97S44
|
1996 English Broadcast News Speech (HUB4)
|
|
| |
|
LDC97T22
|
1996 English Broadcast News Transcripts (HUB4)
|
|
| |
|
LDC98S71
|
1997 English Broadcast News Speech (HUB4)
|
|
| |
|
LDC98T28
|
1997 English Broadcast News Transcripts (HUB4)
|
|
| |
|
LDC2001S91
|
1997 HUB4 Broadcast News Evaluation Non-English Test Material
|
|
| |
|
LDC2002S11
|
1997 HUB4 English Evaluation Speech and Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC98S73
|
1997 Mandarin Broadcast News Speech (HUB4-NE)
|
|
| |
|
LDC98T24
|
1997 Mandarin Broadcast News Transcripts (HUB4-NE)
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2004S11
|
2002 Rich Transcription Broadcast News and Conversational Telephone Speech
|
|
| |
|
| |
|
LDC2005T07
|
ACE Time Normalization (TERN) 2004 English Training Data v 1.0
|
|
| |
|
| |
|
| |
|
LDC99L23
|
American English Spoken Lexicon
|
|
| |
|
LDC2005S07
|
Arabic CTS Levantine Fisher Training Data Set 3, Speech
|
|
| |
|
LDC2005T03
|
Arabic CTS Levantine Fisher Training Data Set 3, Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2003T07
|
Arabic Treebank: Part 1 - 10K-word English Translation
|
|
| |
|
| |
|
LDC2005T02
|
Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic analysis)
|
|
| |
|
| |
|
LDC2005T20
|
Arabic Treebank: Part 3 (full corpus) v 2.0 (MPG + Syntactic Analysis)
|
|
| |
|
| |
|
LDC2005T33
|
BBN Pronoun Coreference and Entity Type Corpus
|
|
| |
|
LDC2005S08
|
BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts
|
|
| |
|
| |
|
LDC2002L49
|
Buckwalter Arabic Morphological Analyzer Version 1.0
|
|
| |
|
LDC2004L02
|
Buckwalter Arabic Morphological Analyzer Version 2.0
|
|
| |
|
LDC96S46
|
CALLFRIEND American English-Non-Southern Dialect
|
|
| |
|
LDC96S47
|
CALLFRIEND American English-Southern Dialect
|
|
| |
|
| |
|
LDC96S55
|
CALLFRIEND Mandarin Chinese-Mainland Dialect
|
|
| |
|
LDC96S56
|
CALLFRIEND Mandarin Chinese-Taiwan Dialect
|
|
| |
|
LDC97L20
|
CALLHOME American English Lexicon (PRONLEX)
|
|
| |
|
LDC97S42
|
CALLHOME American English Speech
|
|
| |
|
LDC97T14
|
CALLHOME American English Transcripts
|
|
| |
|
LDC97S45
|
CALLHOME Egyptian Arabic Speech
|
|
| |
|
LDC2002S37
|
CALLHOME Egyptian Arabic Speech Supplement
|
|
| |
|
LDC97T19
|
CALLHOME Egyptian Arabic Transcripts
|
|
| |
|
LDC2002T38
|
CALLHOME Egyptian Arabic Transcripts Supplement
|
|
| |
|
LDC96L15
|
CALLHOME Mandarin Chinese Lexicon
|
|
| |
|
LDC96S34
|
CALLHOME Mandarin Chinese Speech
|
|
| |
|
LDC96T16
|
CALLHOME Mandarin Chinese Transcripts
|
|
| |
|
| |
|
| |
|
LDC2005T10
|
Chinese English News Magazine Parallel Text
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2002L27
|
Chinese-English Translation Lexicon Version 3.0
|
|
| |
|
| |
|
LDC99L22
|
Egyptian Colloquial Arabic Lexicon
|
|
| |
|
LDC2009T01
|
English CTS Treebank with Structural Metadata
|
|
| |
|
| |
|
| |
|
| |
|
LDC95T11
|
European Language Newspaper Text
|
|
| |
|
| |
|
LDC2005T19
|
Fisher English Training Part 2, Transcripts
|
|
| |
|
LDC2004S13
|
Fisher English Training Speech Part 1 Speech
|
|
| |
|
LDC2004T19
|
Fisher English Training Speech Part 1 Transcripts
|
|
| |
|
LDC2007S02
|
Fisher Levantine Arabic Conversational Telephone Speech
|
|
| |
|
LDC2007T04
|
Fisher Levantine Arabic Conversational Telephone Speech, Transcripts
|
|
| |
|
| |
|
LDC2007T24
|
GALE Phase 1 Arabic Broadcast News Parallel Text - Part 1
|
|
| |
|
LDC2008T09
|
GALE Phase 1 Arabic Broadcast News Parallel Text - Part 2
|
|
| |
|
LDC2009T03
|
GALE Phase 1 Arabic Newsgroup Parallel Text - Part 1
|
|
| |
|
LDC2009T09
|
GALE Phase 1 Arabic Newsgroup Parallel Text - Part 2
|
|
| |
|
LDC2008T06
|
GALE Phase 1 Chinese Blog Parallel Text
|
|
| |
|
LDC2009T02
|
GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 1
|
|
| |
|
LDC2009T06
|
GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 2
|
|
| |
|
LDC2007T23
|
GALE Phase 1 Chinese Broadcast News Parallel Text - Part 1
|
|
| |
|
LDC2008T08
|
GALE Phase 1 Chinese Broadcast News Parallel Text - Part 2
|
|
| |
|
LDC2008T18
|
GALE Phase 1 Chinese Broadcast News Parallel Text - Part 3
|
|
| |
|
LDC2009T15
|
GALE Phase 1 Chinese Newsgroup Parallel Text - Part 1
|
|
| |
|
| |
|
LDC2005S15
|
HKUST Mandarin Telephone Speech, Part 1
|
|
| |
|
LDC2005T32
|
HKUST Mandarin Telephone Transcript Data, Part 1
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC98S69
|
HUB5 Mandarin Telephone Speech Corpus
|
|
| |
|
| |
|
LDC95T8
|
Japanese Business News Text
|
|
| |
|
LDC99T34
|
Japanese Business News Text Supplement
|
|
| |
|
| |
|
LDC2005S14
|
Levantine Arabic QT Training Data Set 4 (Speech + Transcripts)
|
|
| |
|
| |
|
LDC2001T02
|
Message Understanding Conference (MUC) 7
|
|
| |
|
LDC2003T18
|
Multiple-Translation Arabic (MTA) Part 1
|
|
| |
|
LDC2005T05
|
Multiple-Translation Arabic (MTA) Part 2
|
|
| |
|
LDC2003T17
|
Multiple-Translation Chinese (MTC) Part 2
|
|
| |
|
LDC2004T07
|
Multiple-Translation Chinese (MTC) Part 3
|
|
| |
|
| |
|
LDC95T21
|
North American News Text Corpus
|
|
| |
|
LDC98T30
|
North American News Text Supplement
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2004T12
|
RT-03 MDE Training Data Text and Annotations
|
|
| |
|
| |
|
LDC2005T24
|
RT-04 MDE Training Data Text/Annotations
|
|
| |
|
LDC2004S10
|
Santa Barbara Corpus of Spoken American English Part III
|
|
| |
|
LDC2005S25
|
Santa Barbara Corpus of Spoken American English Part IV
|
|
| |
|
| |
|
| |
|
| |
|
LDC99T41
|
Spanish Newswire Text, Volume 2
|
|
| |
|
| |
|
LDC2001S15
|
Switchboard Cellular Part 1 Transcribed Audio
|
|
| |
|
LDC2001T14
|
Switchboard Cellular Part 1 Transcription
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC98S72
|
Taiwanese Putonghua Speech and Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2005S11
|
TDT4 Multilingual Broadcast News Speech Corpus
|
|
| |
|
| |
|
LDC2004T09
|
TIDES Extraction (ACE) 2003 Multilingual Training Data
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
GENOA
|
| |
|
| |
|
| |
HARD
|
| |
Hub4
|
| |
|
| |
|
LDC97S66
|
1996 English Broadcast News Dev and Eval (HUB4)
|
|
| |
|
LDC97S44
|
1996 English Broadcast News Speech (HUB4)
|
|
| |
|
LDC97T22
|
1996 English Broadcast News Transcripts (HUB4)
|
|
| |
|
LDC98S71
|
1997 English Broadcast News Speech (HUB4)
|
|
| |
|
LDC98T28
|
1997 English Broadcast News Transcripts (HUB4)
|
|
| |
|
LDC2001S91
|
1997 HUB4 Broadcast News Evaluation Non-English Test Material
|
|
| |
|
LDC2002S11
|
1997 HUB4 English Evaluation Speech and Transcripts
|
|
| |
|
LDC98S73
|
1997 Mandarin Broadcast News Speech (HUB4-NE)
|
|
| |
|
LDC98T24
|
1997 Mandarin Broadcast News Transcripts (HUB4-NE)
|
|
| |
|
LDC98S74
|
1997 Spanish Broadcast News Speech (HUB4-NE)
|
|
| |
|
LDC98T29
|
1997 Spanish Broadcast News Transcripts (HUB4-NE)
|
|
| |
|
LDC2000S86
|
1998 HUB4 Broadcast News Evaluation English Test Material
|
|
| |
|
LDC95T21
|
North American News Text Corpus
|
|
| |
|
LDC98T30
|
North American News Text Supplement
|
|
| |
Hub5-LVCSR
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC97L20
|
CALLHOME American English Lexicon (PRONLEX)
|
|
| |
|
LDC97S42
|
CALLHOME American English Speech
|
|
| |
|
LDC97T14
|
CALLHOME American English Transcripts
|
|
| |
|
LDC97S45
|
CALLHOME Egyptian Arabic Speech
|
|
| |
|
LDC2002S37
|
CALLHOME Egyptian Arabic Speech Supplement
|
|
| |
|
LDC97T19
|
CALLHOME Egyptian Arabic Transcripts
|
|
| |
|
LDC2002T38
|
CALLHOME Egyptian Arabic Transcripts Supplement
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC96L15
|
CALLHOME Mandarin Chinese Lexicon
|
|
| |
|
LDC96S34
|
CALLHOME Mandarin Chinese Speech
|
|
| |
|
LDC96T16
|
CALLHOME Mandarin Chinese Transcripts
|
|
| |
|
| |
|
| |
|
| |
|
LDC99L22
|
Egyptian Colloquial Arabic Lexicon
|
|
| |
|
LDC98S69
|
HUB5 Mandarin Telephone Speech Corpus
|
|
| |
|
| |
|
LDC98S70
|
HUB5 Spanish Telephone Speech Corpus
|
|
| |
|
| |
|
| |
|
| |
|
| |
JANUS
|
| |
|
| |
|
| |
LCTL
|
| |
LID
|
| |
|
LDC96S46
|
CALLFRIEND American English-Non-Southern Dialect
|
|
| |
|
LDC96S47
|
CALLFRIEND American English-Southern Dialect
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC96S55
|
CALLFRIEND Mandarin Chinese-Mainland Dialect
|
|
| |
|
LDC96S56
|
CALLFRIEND Mandarin Chinese-Taiwan Dialect
|
|
| |
|
LDC96S57
|
CALLFRIEND Spanish-Caribbean Dialect
|
|
| |
|
LDC96S58
|
CALLFRIEND Spanish-Non-Caribbean Dialect
|
|
| |
|
| |
|
| |
Machine Reading
|
| |
MADCAT
|
| |
MDE
|
| |
MSE
|
| |
MT-06
|
| |
MT08
|
| |
MT09
|
| |
MUC
|
| |
|
LDC2003T13
|
Message Understanding Conference (MUC) 6
|
|
| |
|
LDC96T10
|
Message Understanding Conference (MUC) 6 Additional News Text
|
|
| |
|
LDC2001T02
|
Message Understanding Conference (MUC) 7
|
|
| |
|
LDC95T21
|
North American News Text Corpus
|
|
| |
|
| |
|
| |
|
| |
|
| |
NIST Automatic Meeting Recognition
|
| |
|
| |
|
LDC2004T13
|
NIST Meeting Pilot Corpus Transcripts and Metadata
|
|
| |
NIST LRE
|
| |
|
LDC2008S05
|
2005 NIST Language Recognition Evaluation
|
|
| |
|
LDC2009S04
|
2007 NIST Language Recognition Evaluation Test Set
|
|
| |
|
LDC2009S05
|
2007 NIST Language Recognition Evaluation Supplemental Training Set
|
|
| |
NIST MT
|
| |
|
LDC2009T05
|
2008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data
|
|
| |
NIST SRE
|
| |
NTCIR
|
| |
PHANOTICS
|
| |
Reflex-LCTL
|
| |
REFLEX-MTE
|
| |
|
LDC2009T11
|
REFLEX Entity Translation Training/DevTest
|
|
| |
RM
|
| |
ROAR
|
| |
|
| |
|
| |
RT
|
| |
SID
|
| |
|
LDC96S61
|
1996 Speaker Recognition Benchmark
|
|
| |
|
LDC99S80
|
1997 Speaker Recognition Benchmark
|
|
| |
|
LDC98S76
|
1998 Speaker Recognition Benchmark
|
|
| |
|
LDC99S81
|
1999 Speaker Recognition Benchmark
|
|
| |
|
LDC2001S97
|
2000 NIST Speaker Recognition Evaluation
|
|
| |
|
LDC2002S34
|
2001 NIST Speaker Recognition Evaluation Corpus
|
|
| |
|
LDC2004S04
|
2002 NIST Speaker Recognition Evaluation
|
|
| |
|
| |
|
LDC2001S15
|
Switchboard Cellular Part 1 Transcribed Audio
|
|
| |
|
LDC2001T14
|
Switchboard Cellular Part 1 Transcription
|
|
| |
|
| |
|
| |
|
| |
|
| |
SIGHAN
|
| |
SPINE
|
| |
|
LDC2000S96
|
Speech in Noisy Environments (SPINE) Evaluation Audio
|
|
| |
|
LDC2000T54
|
Speech in Noisy Environments (SPINE) Evaluation Transcripts
|
|
| |
|
LDC2000S87
|
Speech in Noisy Environments (SPINE) Training Audio
|
|
| |
|
LDC2000T49
|
Speech in Noisy Environments (SPINE) Training Transcripts
|
|
| |
|
LDC2001S04
|
Speech in Noisy Environments (SPINE2) Part 1 Audio
|
|
| |
|
LDC2001T05
|
Speech in Noisy Environments (SPINE2) Part 1 Transcripts
|
|
| |
|
LDC2001S06
|
Speech in Noisy Environments (SPINE2) Part 2 Audio
|
|
| |
|
LDC2001T07
|
Speech in Noisy Environments (SPINE2) Part 2 Transcripts
|
|
| |
|
LDC2001S08
|
Speech in Noisy Environments (SPINE2) Part 3 Audio
|
|
| |
|
LDC2001T09
|
Speech in Noisy Environments (SPINE2) Part 3 Transcripts
|
|
| |
|
LDC2001S99
|
Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio
|
|
| |
STD
|
| |
TAC
|
| |
Talkbank
|
| |
|
LDC2005T35
|
American National Corpus (ANC) Second Release
|
|
| |
|
| |
|
| |
|
LDC2003L01
|
Grassfields Bantu Fieldwork: Dschang Lexicon
|
|
| |
|
LDC2003S02
|
Grassfields Bantu Fieldwork: Dschang Tone Paradigms
|
|
| |
|
LDC2001S16
|
Grassfields Bantu Fieldwork: Ngomba Tone Paradigms
|
|
| |
|
LDC2004L01
|
Klex: Finite-State Lexical Transducer for Korean
|
|
| |
|
| |
|
LDC2003S06
|
Santa Barbara Corpus of Spoken American English Part II
|
|
| |
|
LDC2004S10
|
Santa Barbara Corpus of Spoken American English Part III
|
|
| |
|
LDC2005S25
|
Santa Barbara Corpus of Spoken American English Part IV
|
|
| |
|
LDC2003T15
|
SLX Corpus of Classic Sociolinguistic Interviews
|
|
| |
|
LDC2004S12
|
TalkBank Ethology Data: Field Recordings of Vervet Monkey Calls
|
|
| |
TDT
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2005S11
|
TDT4 Multilingual Broadcast News Speech Corpus
|
|
| |
|
| |
|
| |
|
| |
TIDES
|
| |
|
| |
|
LDC2005T07
|
ACE Time Normalization (TERN) 2004 English Training Data v 1.0
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2003T07
|
Arabic Treebank: Part 1 - 10K-word English Translation
|
|
| |
|
| |
|
LDC2005T02
|
Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic analysis)
|
|
| |
|
| |
|
LDC2005T20
|
Arabic Treebank: Part 3 (full corpus) v 2.0 (MPG + Syntactic Analysis)
|
|
| |
|
| |
|
LDC2005T33
|
BBN Pronoun Coreference and Entity Type Corpus
|
|
| |
|
| |
|
LDC2002L49
|
Buckwalter Arabic Morphological Analyzer Version 1.0
|
|
| |
|
LDC2004L02
|
Buckwalter Arabic Morphological Analyzer Version 2.0
|
|
| |
|
| |
|
| |
|
LDC2005T10
|
Chinese English News Magazine Parallel Text
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2002L27
|
Chinese-English Translation Lexicon Version 3.0
|
|
| |
|
LDC2007T02
|
English Chinese Translation Treebank v 1.0
|
|
| |
|
| |
|
| |
|
LDC95T11
|
European Language Newspaper Text
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC95T8
|
Japanese Business News Text
|
|
| |
|
LDC99T34
|
Japanese Business News Text Supplement
|
|
| |
|
| |
|
| |
|
LDC2001T02
|
Message Understanding Conference (MUC) 7
|
|
| |
|
LDC2003T18
|
Multiple-Translation Arabic (MTA) Part 1
|
|
| |
|
LDC2005T05
|
Multiple-Translation Arabic (MTA) Part 2
|
|
| |
|
LDC2003T17
|
Multiple-Translation Chinese (MTC) Part 2
|
|
| |
|
LDC2004T07
|
Multiple-Translation Chinese (MTC) Part 3
|
|
| |
|
LDC2006T04
|
Multiple-Translation Chinese (MTC) Part 4
|
|
| |
|
| |
|
LDC95T21
|
North American News Text Corpus
|
|
| |
|
LDC98T30
|
North American News Text Supplement
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC99T41
|
Spanish Newswire Text, Volume 2
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
LDC2005S11
|
TDT4 Multilingual Broadcast News Speech Corpus
|
|
| |
|
| |
|
LDC2004T09
|
TIDES Extraction (ACE) 2003 Multilingual Training Data
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
Tipster
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
Transtac
|
| |
TREC
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
VACE
|
|
|

About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Da
ta
Contact: ldc@ldc.upenn.edu
(c) 1992-2008 Linguistic Data Consortium,
University of Pennsylvania. All Rights
Reserved.
|