New Corpora

TIMIT-like L1, L2 English Read Speech: Global TIMIT Learner Treebank English; developed by LDC and LAIX, Inc., 24 hours of L1, L2 English read speech & transcripts, two sets each of 50 speakers reading 120 sentences from Treebank-3 (LDC99T42), 3220 sentence types – 20 sentences read by all speakers, 40 sentences read by 10 speakers, 60 sentences read by one speaker

Persian text annotated for genre analysis: Corpus of Law, Academic, and News; 400 Persian legal, academic and news documents with metadata for text type, dates & source and annotations marking title and body paragraphs

Mongolian Telephone Speech: IARPA Babel Mongolian Language Pack IARPA-babel401b-v2.0b: last release in the IARPA Babel series, developed by Appen, 204 hours of Hahl Mongolian conversational and scripted telephone speech with transcripts collected in 2014 from speakers aged 16 to 65 years old using different telephones in various environments, including the street, a home or office, a public place, and inside a vehicle

Propbank & Verb Sense Disambiguation for English Informal Text: BOLT English PropBank and Sense – Discussion Forum, SMS/Chat, and Conversational Telephone Speech; developed by Univ. of Colorado Boulder - CLEAR for the DARPA BOLT English propbanking task, propbank (semantic layer) and verb sense disambiguation on LDC’s BOLT phrase structure treebank annotation

Tigrinya Text, Annotations and Tools for Rapid Event Response: LORELEI Tigrinya Incident Language Pack; developed by LDC, Tigrinya monolingual text, bi-text and English translations with situation frame, name entity & entity linking and detection annotation supporting human language technology for low resource languages in the context of emergent situations like natural disasters or disease outbreaks

Chinese Lexicons: Chinese Lexical Resources for Gender, Number, Animacy; developed by LDC for the DARPA DEFT program from newswire texts in Chinese Gigaword Fifth Edition LDC2011T13, resources include dictionaries of Chinese animate nominals and names, nominals and names with gender and number predicted and dictionaries of nominals, names, verbs and pronouns with frequency information for each

Vietnamese Text, Annotations and Tools for Rapid Event Response: LORELEI Vietnamese Representative Language Pack; developed by LDC, Vietnamese monolingual text, bi-text and English translations with situation frame, name entity & entity linking and detection annotation supporting human language technology for low resource languages in the context of emergent situations like natural disasters or disease outbreaks

LDC ERE Schema for Chinese Web Text: DEFT Chinese Light and Rich ERE Annotation; Chinese discussion forum text annotated for entities, relations and events (ERE) using ERE Light and ERE Rich annotations schemas developed by LDC for the DARPA DEFT program

Classic Southern American English Telephone Speech Collection w/ Up-to-Date Formats and Metadata: CALLFRIEND American English – Southern Dialect Second Edition updates the classic LDC 1996 release with audio files in wav format, a simplified directory structure & new documentation and metadata