New Corpora

Bulgarian speech and annotations for cross language information retrieval: MATERIAL Bulgarian-English Language Pack: developed by Appen for the IARPA MATERIAL program, 80 hours of Bulgarian conversational telephone speech, transcripts, English translations, annotations and queries designed to support cross language information retrieval

Dialog behaviors across English and Spanish: Dialogs Re-Enacted Across Languages: 17 hours (3816 utterance pairs) of English and Spanish speech from spontaneous conversations and close re-enactments produced by 129 unique bilingual speakers, developed at the University of Texas at El Paso 

Tibetan elicited speech, transcripts and metadata: Diaspora Tibetan Speech: developed at Yale University, 28 hours of Tibetan elicited speech by 73 speakers from the diaspora Tibetan community in Kathmandu, Nepal, along with transcripts, elicitation materials and speaker metadata

Annotations interpreting events, situations and trends for English, Russian, and Spanish web documents: AIDA Scenario 2 Practice Topic Annotation: entities, events and relations in 29 English, Russian and Spanish web documents annotated in three categories (mentions, slots, linking), developed by LDC for the DARPA AIDA program

Telephone speech collected in China and the Philippines: Call My Net 1: 364 hours of conversational telephone speech in Mandarin, Cantonese, Tagalog and Cebuano collected in 2015 from 221 native speakers, with metadata and speaker demographic information, developed by LDC to support the 2016 NIST Speaker Recognition Evaluation

Portuguese translation of ACE English text and annotations: Automatic Content Extraction for Portuguese: developed at INESC TEC - Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência, Brazilian Portuguese and European Portuguese automatic translations from ACE 2005 Multilingual Training Corpus LDC2006T06, using Google Translate and DeepL Translate, respectively