November 2016 Newsletter

Tuesday, November 15, 2016

New Corpora

JANA: A Human-Human Dialogues Corpus for Egyptian Dialect

Multi-Language Conversational Telephone Speech 2011 – Slavic Group

IARPA Babel Georgian Language Pack IARPA-babel404b-v1.0a

GALE Phase 3 and 4 Chinese Newswire Parallel Text

Announcements

Join LDC for Membership Year 2017
Organizations engaged in language-related research, education and technology development are invited to join LDC for Membership Year (MY) 2017. Consortium members enjoy unparalleled access and continuing rights to new data releases and to an archive of close to 700 holdings.

Membership fees have not increased for 2017. In addition, discounts are available for organizations who keep their membership current and for those who join before March 1, 2017.

           • MY 2016 members receive a 10% discount if they renew their membership before March 1, 2017. After March 1, MY2016 members receive a 5% discount if they renew their membership any time in 2017.

           • New members and returning former members receive a 5% discount off the membership fee if they join/renew before March 1, 2017.

Plans for MY2017 publications are in progress. Among the expected releases are:

  • 2010 NIST Speaker Recognition Evaluation data set
  • Multilanguage conversational telephone speech: developed to support language identification research in related languages
  • UCLA High Speed Laryngeal Database: audio recordings and high-speed videoendoscopic images of the vocal folds while sustaining vowels
  • Noisy TIMIT: TIMIT with added artificial noise
  • CHiME shared task data: noisy read WSJ speech
  • First Year Law Students’ Memoranda: memos to a hypothetical court with annotations
  • IARPA Babel Language Packs: languages include Vietnamese, Haitian Creole, Zulu, Kazakh and Lithuanian
  • BOLT: source, parallel and word-aligned data in all languages
  • RATS Keyword Spotting data set
  • GALE Phases 3 and 4: all tasks and languages   

Visit Join LDC for details on membership, user accounts and payment.

Commercial Use and LDC data
For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. Visit the Licensing page for further information. 

Spring 2017 Data Scholarship Program
Applications are now being accepted through January 15, 2017 for the Spring 2017 LDC Data Scholarship program which provides university students with no-cost access to LDC data. Consult the LDC Data Scholarship page for further information about program rules and submission requirements. 

LDC Closed November 24-25 for US Thanksgiving Holiday
LDC will be closed on Thursday, November 24, 2016 and Friday, November 25, 2016 in observance of the US Thanksgiving Holiday. The office will reopen on Monday, November 28, 2016.