New Corpora

Uyghur language resources for HLT development in emergent situations: LORELEI Uyghur Incident Language Pack: monolingual, bilingual and comparative text with named entity annotation and situation frame analysis, developed by LDC for the DARPA LORELEI program

Faroese prompted and spontaneous speech and transcripts for speech recognition: Ravnursson Faroese Speech and Transcripts: from data developed in the Faroe Islands’ Ravnur Project, 109 hours of Faroese speech from 433 speakers (249 female, 184 male) with corresponding orthographic transcripts and speaker metadata 

Bulgarian speech and annotations for cross language information retrieval: MATERIAL Bulgarian-English Language Pack: developed by Appen for the IARPA MATERIAL program, 80 hours of Bulgarian conversational telephone speech, transcripts, English translations, annotations and queries designed to support cross language information retrieval

Dialog behaviors across English and Spanish: Dialogs Re-Enacted Across Languages: 17 hours (3816 utterance pairs) of English and Spanish speech from spontaneous conversations and close re-enactments produced by 129 unique bilingual speakers, developed at the University of Texas at El Paso 

Tibetan elicited speech, transcripts and metadata: Diaspora Tibetan Speech: developed at Yale University, 28 hours of Tibetan elicited speech by 73 speakers from the diaspora Tibetan community in Kathmandu, Nepal, along with transcripts, elicitation materials and speaker metadata

Annotations interpreting events, situations and trends for English, Russian, and Spanish web documents: AIDA Scenario 2 Practice Topic Annotation: entities, events and relations in 29 English, Russian and Spanish web documents annotated in three categories (mentions, slots, linking), developed by LDC for the DARPA AIDA program