June 2015 Newsletter

Monday, June 15, 2015

New Corpora


GALE Phase 4 Chinese Broadcast Conversation Parallel Sentences

RST Signalling Corpus


Customize a Data Pack from 2013 publications

There is still time for not-for-profit and government organizations to create a custom data collection of eight corpora from among LDC’s 2013 releases.  Selection options include: 1993-2007 United Nations Parallel Text, Chinese Treebank 8.0, CSC Deceptive Speech, GALE Arabic and Chinese speech and text releases, Greybeard, MADCAT training data, NIST 2012 Open Machine Translation (OpenMT) evaluation and progress sets, and more.  The 2013 Data Pack is available for a flat rate of $3500 through September 15, 2015.

To license the Data Pack and select eight corpora, login or register for an LDC user account and add the 2013 Data Pack and each of the eight data sets to your bin. Follow the check-out procedure, sign all applicable user agreements and select payment via wire transfer, purchase order or check. LDC will adjust the invoice total to reflect the data pack fee.

To pay via credit card, add the 2013 Data Pack to your bin and check out using the system prompts. At the completion of the transaction, send an email to ldc@ldc.upenn.edu indicating the eight data sets to include in your order.