New Corpora

Semantic Dependency Parsing: ­­SDP 2014 & 2015: Broad Coverage Semantic Dependency Parsing: data, tools, system results, and publications associated with the 2014 and 2015 tasks on Broad-Coverage Semantic Dependency Parsing (SDP) for Chinese, Czech and English, conducted in conjunction with the International Workshop on Semantic Evaluation (SemEval) and developed by the SDP task organizers

German Student Handwriting: H1 Children's Writing: developed by the Cooperative State University Baden-Württemberg, University of Education, 996 texts written over three months by 88 German elementary school children ages seven to eleven with metadata

User-generated Videos with Transcriptions: HAVIC Pilot Transcription: developed by LDC and NIST, approximately 72 hours of user-generated videos with transcripts based on the English speech audio extracted from the videos

English Proxy Reports: DEFT Narrative Text: developed by LDC, proxy reports intended to mimic the format and other features of some types of government analyst reports and their corresponding English newswire source documents

GALE Broadcast Collection: Arabic/Chinese broadcast speech collected by LDC for the DARPA GALE program with associated transcripts

GALE Phase 4 Chinese Broadcast Conversation Speech: 172 hours of Mandarin Chinese broadcast conversation speech collected in 2008

GALE Phase 4 Chinese Broadcast Conversation Transcripts: the complete set of corresponding transcripts including 2,259,952 tokens in plain-text, tab delimited format with UTF-8 encoding

GALE Parallel, Word Aligned and Tagged Text:  Arabic/Chinese and English parallel, word aligned and tagged resources LDC developed for the DARPA GALE program

GALE Phase 4 Arabic Broadcast Conversation Parallel Sentences: 170 source-translation document pairs, comprising 44,064 words (Arabic Source) of translated data

GALE Phase 3 and 4 Arabic Web Parallel Text: 124 source-translation document pairs, comprising 61,662 tokens of Arabic source text and its English translation

GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text: 63 source-translation document pairs, comprising 487,466 tokens of Chinese source text and its English translation