March 2009 Newsletter

Tuesday, March 17, 2009

New Corpora

2008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data

GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 2

Unified Linguistic Annotation Text Collection

Announcements

Additional Free LDC Resources
LDC is pleased to distribute the Unified Linguistic Annotation Text Collection (LDC2009T07) corpora at no cost to support the work of the ULA project. As mentioned above, to license a copy of this data, non-members should complete the LDC User Agreement for Non-members and fax to +1 215 573 2175 or scan and email to this address.  On the heels of the release of the ULA corpora, LDC would like to highlight other resources which are available at no cost.  Free grant-covered copies of the following Talkbank databases can be licensed from LDC:

  • LDC2003V01  FORM2 Kinematic Gesture
  • LDC2003L01  Grassfields Bantu Fieldwork: Dschang Lexicon
  • LDC2003S02  Grassfields Bantu Fieldwork: Dschang Tone Paradigms
  • LDC2001S16  Grassfields Bantu Fieldwork: Ngomba Tone Paradigms
  • LDC2004L01  Klex: Finite-State Lexical Transducer for Korean
  • LDC2004T03  Morphologically Annotated Korean Text
  • LDC2003S06  Santa Barbara Corpus of Spoken American English Part II
  • LDC2004S10  Santa Barbara Corpus of Spoken American English Part III
  • LDC2005S25  Santa Barbara Corpus of Spoken American English Part IV
  • LDC2003T15  SLX Corpus of Classic Sociolinguistic Interviews
  • LDC2004S12  TalkBank Ethology Data: Field Recordings of Vervet Monkey Calls

Further information, including additional free datasets such as TimeBank 1.2, and useful tools such as LDC's parallel text sentence aligner, Champollion, can be found in our What's New! What's Free! Archive.

Membership Mailbag - Commercial Use of LDC data
LDC's Membership office responds to a few thousand emailed queries a year, and, over time, we've noticed that some questions tend to crop up with regularity.  To address the questions that you, our data users, have asked, we'd like to continue our Membership Mailbag series of newsletter articles.  This month we will focus on commercial rights to LDC data, with an emphasis on the LDC For-Profit membership.

To help clarify commercial use of LDC data, let's look at a few examples in which a commercial organization licenses LDC data.  In the first scenario, a company, TryFirst JoinLater LLC., licenses data as a non-member.  At this point, the company is not an LDC member and cannot use LDC data for any commercial purpose.   Some years later, TryFirst JoinLater decides to join LDC as a For-Profit member.  Do they now have commercial rights to the data licensed as a non-member?  Yes, by joining the LDC, TryFirst JoinLater gains commercial rights to any data already licensed, unless those rights are otherwise restricted by a corpus-specific user license.  In short, a commercial organization can first license data as a non-member for research purposes and then join LDC to gain commercial rights to that data.

Second scenario.  Another company, Join Only Once, Ltd., decides to join LDC as a For-Profit Member for Membership Year 2009.  What data will this company be able to use for commercial purposes?  As 2009 member, Join Only Once will gain commercial rights to data from the year that they have joined, that is, Membership Year 2009, unless otherwise restricted by a corpus-specific user license.  Furthermore, while a member of the current year, Join Only Once can license data for commercial use from the closed Membership Years (1993-2007) at the Reduced Licensing Fee. Join Only Once, Ltd. retains ongoing commercial rights to data it licenses as a For-Profit member. Fast forward a few years - Join Only Once has not renewed their LDC membership but they would like to obtain some additional data not from their Membership Year.  If Join Only Once does not renew their LDC membership, they will not have a commercial license to any new data obtained after their Membership Year has ended.

Which leads us to our final scenario.  A third company, Best LDC Member Ever! Corporation, has been a For-Profit LDC member since our inception in 1992.  Does this company have commercial rights to all LDC data?  No, there are a few caveats to note. All members are reminded to consult corpus-specific license agreements for limitations, including commercial restrictions, on the use of certain corpora. In the case of a small group of corpora that includes American National Corpus (ANC) Second Release (LDC2005T35), Buckwalter Arabic Morphological Analyzer Version 2.0 (LDC2004L02) and all CSLU corpora, commercial licenses must be obtained separately from the owners of the data. A full list of corpus-specific user licenses can be found on our License Agreements page.

Got a question?  About LDC data?  Forward it to .  The answer may appear in a future Membership Mailbag article.