New Corpora

Akan Resources for Emergent Situation Response: LORELEI Akan Representative Language Pack: developed by LDC, Akan monolingual and parallel text with morphological segmentation, situation frame, name entity & entity linking and detection annotation supporting human language technology for low resource languages in the context of emergent situations like natural disasters or disease outbreaks 

Multi-language Translations of Classic English Travel Interactions: ATIS – Seven Languages: Spanish, German, French, Portuguese, Chinese & Japanese translations of 5,871 English utterances from ATIS (Air Travel Information Services) corpora, specifically ATIS2 (LDC93S5), ATIS3 Training Data (LDC94S19), and ATIS3 Test Data (LDC95S26), includes named entity annotation and training/test split, developed by Amazon Web Services, Inc.

Treebanked Informal English Text: BOLT English Treebank – SMS/Chat: developed by LDC, part-of-speech and syntactic structure annotation for 115,667 tokens/words of English SMS and text chat following revised Penn Treebank II guidelines

English Informal Text Annotated for Co-reference: BOLT English Co-reference – Discussion Forum, SMS/Chat, and Conversational Telephone Speech; developed by Raytheon BBN Technologies for the BOLT co-reference task, co-reference annotation on BOLT treebank annotation covering noun phrases (proper nouns, nominals, pronouns and null arguments), possessives, proper noun pre-modifiers and verbs 

Arabic Consonants, Vowels and Words from Native Speakers: Phonemes of Arabic; one hour of speech from native Arabic speakers, all Arabic sounds (consonants and vowels) and 24 words with specific consonant-vowel patterns, developed at the Florida Institute of Technology  

Mandarin Guanzhong Dialect Read Speech: Global TIMIT Mandarin Chinese – Guanzhong Dialect; developed by LDC and Xi’an Jiaotong University, five hours of read speech and transcripts, 50 speakers reading 120 sentences from Chinese Gigaword Fifth Edition (LDC2011T13) in the Guanzhong dialect as spoken in Shannxi province, 3220 sentence types -- 20 sentences read by all speakers, 40 sentences read by 10 speakers,60 sentences read by one speaker 

More TIMIT L1, L2 English Read Speech: Global TIMIT Learner Simple English; developed by LDC and Shanghai Jiao Tong University, 12 hours of L1, L2 English read speech & transcripts, two sets each of 50 speakers reading 120 “simple” sentences from TIMIT (LDC9321), 820 sentence types – 20 sentences read by all speakers, 40 sentences read by 10 speakers, 60 sentences read by one speaker

Ukrainian Text, Annotations and Tools for Rapid Event Response: LORELEI Ukrainian Representative Language Pack; developed by LDC, Ukrainian monolingual, parallel and comparable text with situation frame, name entity & entity linking and detection annotation supporting human language technology for low resource languages in the context of emergent situations like natural disasters or disease outbreaks

TAC KBP Event Argument Task Resources: TAC KBP Event Argument – Comprehensive Training and Evaluation Data 2016-2017, source documents, manual runs, assessments and event hoppers developed by LDC from Chinese, English and Spanish newswire and discussion forum text for extracting event entities from unstructured text, indicating their role and linking the event