New Corpora

Parallel Data Annotated for Semantic Role Labeling: X-SRL: Parallel Cross-lingual Semantic Role Labeling: developed by Heidelberg University and the Leibniz Institute for the German Language, French, German and  Spanish translations of English texts from the 2009 CoNLL shared task (Treebank, Propbank, NomBank) tokenized, lemmatized and POS-tagged, with syntactic parses and semantic labels, includes train, dev and test partitions

Sentiment Evaluation for English News and Web Text: TAC KBP English Sentiment Slot Filling – Comprehensive Training and Evaluation Data 2013-2014: queries, manual runs, and assessment results on English news and web text developed by LDC, the sentiment slot filling task evaluated the quality of detectors for positive and negative sentiment

Computer Game Interactions: Columbia Games Corpus: 10 hours of spontaneous English conversation from 13 subjects playing a series of computer games requiring verbal communication to achieve joint goals, with orthographic transcripts and annotation marking discourse and turn-taking, developed by the Spoken Language Group, Columbia University and the Department of Linguistics, Northwestern University

TIMIT-like Chinese Speech: Global TIMIT Mandarin Chinese: developed by LDC and Shanghai Jiao Tong University, five hours of read speech and transcripts, 50 speakers reading 120 sentences from Chinese Gigaword Fifth Edition (LDC2011T13), 3220 sentence types (20 sentences read by all speakers, 40 sentences read by 10 speakers, 60 sentences read by one speaker), speakers were students w/ Mandarin proficiency

Chinese Informal Text Annotated for Co-reference: BOLT Chinese Co-reference – Discussion Forum, SMS/Chat, and Conversational Telephone Speech: developed by Raytheon BBN Technologies for the BOLT co-reference task, co-reference annotation  on BOLT treebank annotation covering noun phrases (proper nouns, nominals, pronouns and null arguments), possessives, proper noun pre-modifiers and verbs 

Icelandic Parliament Speeches and Resources for ASR: Althingi Parliamentary Speech: 540 hours of recorded performed speech from 197 speakers with transcripts, a pronunciation dictionary and language models, project led by Reykjavik University in collaboration with the Althingi speech department to develop an ASR (automatic speech recognition) system for Icelandic parliamentary speech to replace the procedure of manually transcribing member speeches

German Translation of WSJ Text w/ Discourse Relations: Penn Discourse Treebank 2.0 – German Translation: one million tokens derived from Penn Discourse Treebank Version 2.0 (LDC2008T05) (English WSJ text and annotations), translated into German and annotated for shallow discourse relations, developed at the University of Potsdam’s Applied Computational Linguistics group 

Information Extraction of New Relations/Events: TAC-KBP English Surprise Slot Filling – Comprehensive Training and Evaluation Data 2010: queries, manual runs and final assessment results produced by LDC for the 2010 Surprise Slot Filling Track, participants developed systems in four days with new slot/attribute types, annotation guidelines & training data, source data is English newswire, broadcast, web text