Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



LDC Top Ten Corpora

These 10 LDC corpora are the most popular (number distributed is in italic)

1062LDC93S1TIMIT Acoustic-Phonetic Continuous Speech Corpus
932LDC2006T13Web 1T 5-gram Version 1
761LDC96L14CELEX2
461LDC93S10TIDIGITS
448LDC94T5ECI Multilingual Text
386LDC99T42Treebank-3
351LDC93T3ATIPSTER Complete
326LDC93S2NTIMIT
286LDC94S16YOHO Speaker Verification
282LDC2001T02Message Understanding Conference (MUC) 7

About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact: ldc@ldc.upenn.edu

(c) 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.