New Corpora

Public safety operational speech: 2017 NIST OpenSAT Pilot - SSSF: radio and telephone dispatches, transcripts and annotations from a real-world, fire response event; speech conditions include land-mobile-radio transmission effects, significant background noise, speech under stress and variable decibel levels

Kinyarwanda language resources for HLT development: LORELEI Kinyarwanda Incident Language Pack – monolingual, bilingual and comparative text with entity linking and detection annotation and situation frame analysis, developed by LDC for the DARPA LORELEI program

English translation SMS/Chat treebank: BOLT English Translation Treebank – Chinese SMS/Chat:developed by LDC for the DARPA BOLT program, 108,385 tokens of translated Chinese text annotated for part-of-speech and syntactic structure in Penn Treebank II style 

Amateur web video for multimodal event detection: HAVIC MED Training Data – Videos, Metadata and Annotation: 2,100 hours annotated with event properties and topic and genre categories developed by LDC for the 2011-2015 NIST-sponsored MED (Multimedia Event Detection) tasks

Propbank for Egyptian Arabic informal text, speech: BOLT Egyptian Arabic PropBank and Sense – Discussion Forum, SMS/Chat, and Conversational Telephone Speech: developed by the University of Colorado Boulder - CLEAR (Computational Language and Education Research), semantic annotation over BOLT phrase structure treebank annotation for the DARPA BOLT program 

Diarization shared task for challenging data: Second DIHARD Challenge Development - Eleven Sources: 22 hours of English and Chinese speech and annotations from diverse sources developed by LDC for the Second DIHARD Challenge

Infant language recordings in home environments: Second DIHARD Challenge Development - SEEDLingS: two hours of English child language recordings from collection by Duke University annotated by LDC for the Second DIHARD Challenge