New Corpora

Multilingual speech and non-speech for speech activity detection: RATS Low Speech Density: 87 hours of English, Levantine Arabic, Farsi, Pashto and Urdu phone speech and non-speech samples, developed by LDC to measure false alarm performance in RATS speech activity detection systems

English adult infant-directed utterances: BabyEars Affective Vocalizations: 22 minutes of spontaneous English speech by 12 adults interacting with their infant children, 509 infant-directed utterances classified as either approval, attention or prohibition

English L2 speakers in academic settings: Second Language University Speech Intelligibility Corpus: 10.5 hours of L2 English speech at 10 North American universities – presentations, descriptions, microteaching – by 66 faculty and students from 15 language backgrounds, plus transcripts and metadata, developed by Northern Arizona University, The Pennsylvania State University and The University of Texas at Dallas

Annotations interpreting events, situations and trends for English, Russian, and Ukrainian web documents: AIDA Scenario 1 Practice Topic Annotation: entities, events and relations in 212 English, Russian and Ukrainian web documents annotated in three categories (mentions, slots, linking), developed by LDC for the DARPA AIDA program  

Kurmanji and Sorani Kurdish speech and transcripts for language ID and speech recognition: KASET - Kurmanji and Sorani Kurdish Speech and Transcripts: developed by LDC, 147 hours of telephone speech and broadcast audio with 60 hours transcribed and speaker metadata

Farsi language resources for HLT development: LORELEI Farsi Representative Language Pack: monolingual and parallel text with entity linking and detection annotation and situation frame analysis, developed by LDC for the DARPA LORELEI program