October 2019 Newsletter

Tuesday, October 15, 2019

New Publications

BOLT English Treebank - Discussion Forum

Polish Speech Database

2016 NIST Speaker Recognition Evaluation Test Set

Announcements

Membership Year 2020 Publication Preview

The 2020 Membership Year is just around the corner and plans for next year’s publications are in progress. Among the expected releases are:

  • Abstract Meaning Representation (AMR) Annotation Release 3.0: semantic treebank of over 59,000 English natural language sentences from broadcast conversations, newswire, weblogs and web discussion forums; updates the second version (LDC2017T10) with new annotations
  • TAC KBP: English sentiment slot filling, surprise slot filling, nugget detection and coreference, and event argument data in all languages (English, Chinese and Spanish)
  • DEFT Chinese ERE: Chinese discussion forum data annotated for entities, relations and events
  • LibriVox Spanish: 73 hours of Spanish audiobook read speech and transcripts
  • IARPA Babel Language Packs (telephone speech and transcripts): languages include Dhuluo, Javanese and Mongolian
  • HAVIC Med Training data: web video, metadata, and annotations for developing multimedia systems
  • RATS Speaker Identification: conversational telephone speech in Levantine Arabic, Pashto, Urdu, Farsi and Dari on degraded audio signals with annotation of speech segments for speaker identification
  • BOLT: discussion forums, SMS/chat, conversational telephone speech, word-aligned, tagged and co-reference data in all languages (Chinese, Egyptian Arabic, and English) 

Check your inbox in the coming weeks for more information about membership renewal. 

LDC data and commercial technology development

For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. Visit the Licensing page for further information.