New Corpora

Telephone speech collected in China and the Philippines: Call My Net 1: 364 hours of conversational telephone speech in Mandarin, Cantonese, Tagalog and Cebuano collected in 2015 from 221 native speakers, with metadata and speaker demographic information, developed by LDC to support the 2016 NIST Speaker Recognition Evaluation

Portuguese translation of ACE English text and annotations: Automatic Content Extraction for Portuguese: developed at INESC TEC - Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência, Brazilian Portuguese and European Portuguese automatic translations from ACE 2005 Multilingual Training Corpus LDC2006T06, using Google Translate and DeepL Translate, respectively

Hausa language resources for HLT development: LoReHLT Hausa Representative Language Pack: monolingual and parallel text with entity and semantic annotation and audio recordings annotated for topics, developed by LDC for LoReHLT, a companion project to the DARPA LORELEI program

English, Russian and Spanish multimodal data: AIDA Scenario 2 Practice Topic Source Data: 1500 root files (text, image, video) from English, Russian, and Spanish web sources for a scenario about the socioeconomic and political crisis in Venezuela since 2010, developed by LDC for the DARPA AIDA program 

Multilingual speech and non-speech for speech activity detection: RATS Low Speech Density: 87 hours of English, Levantine Arabic, Farsi, Pashto and Urdu phone speech and non-speech samples, developed by LDC to measure false alarm performance in RATS speech activity detection systems

English adult infant-directed utterances: BabyEars Affective Vocalizations: 22 minutes of spontaneous English speech by 12 adults interacting with their infant children, 509 infant-directed utterances classified as either approval, attention or prohibition