Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



LDC Top Ten Corpora

These 10 LDC corpora are the most popular (number distributed is in italic)

921LDC93S1TIMIT Acoustic-Phonetic Continuous Speech Corpus
691LDC96L14CELEX2
580LDC2006T13Web 1T 5-gram Version 1
411LDC93S10TIDIGITS
372LDC94T5ECI Multilingual Text
303LDC93S2NTIMIT
294LDC99T42Treebank-3
292LDC93T3ATIPSTER Complete
258LDC94S16YOHO Speaker Verification
245LDC2001T02Message Understanding Conference (MUC) 7

About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Da ta

Contact: ldc@ldc.upenn.edu

(c) 1992-2007 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.