New Corpora

Hausa language resources for HLT development: LoReHLT Hausa Representative Language Pack: monolingual and parallel text with entity and semantic annotation and audio recordings annotated for topics, developed by LDC for LoReHLT, a companion project to the DARPA LORELEI program

English, Russian and Spanish multimodal data: AIDA Scenario 2 Practice Topic Source Data: 1500 root files (text, image, video) from English, Russian, and Spanish web sources for a scenario about the socioeconomic and political crisis in Venezuela since 2010, developed by LDC for the DARPA AIDA program 

Multilingual speech and non-speech for speech activity detection: RATS Low Speech Density: 87 hours of English, Levantine Arabic, Farsi, Pashto and Urdu phone speech and non-speech samples, developed by LDC to measure false alarm performance in RATS speech activity detection systems

English adult infant-directed utterances: BabyEars Affective Vocalizations: 22 minutes of spontaneous English speech by 12 adults interacting with their infant children, 509 infant-directed utterances classified as either approval, attention or prohibition

English L2 speakers in academic settings: Second Language University Speech Intelligibility Corpus: 10.5 hours of L2 English speech at 10 North American universities – presentations, descriptions, microteaching – by 66 faculty and students from 15 language backgrounds, plus transcripts and metadata, developed by Northern Arizona University, The Pennsylvania State University and The University of Texas at Dallas

Annotations interpreting events, situations and trends for English, Russian, and Ukrainian web documents: AIDA Scenario 1 Practice Topic Annotation: entities, events and relations in 212 English, Russian and Ukrainian web documents annotated in three categories (mentions, slots, linking), developed by LDC for the DARPA AIDA program