August 2010 Newsletter

Wednesday, August 18, 2010

New Corpora

Asian Elephant Vocalizations

NIST 2005 Open Machine Translation (OpenMT) Evaluation

TRECVID 2006 Keyframes


Fall 2010 LDC Data Scholarship Program

Applications are now being accepted through September 15, 2010 for the Fall 2010 LDC Data Scholarship program!   The LDC Data Scholarship program provides university students with access to LDC data at no-cost.  Data scholarships are offered twice a year to correspond to the Fall and Spring semesters, beginning with the Fall 2010 semester (September - December 2010). Several students can be awarded scholarships during each program cycle.  This program is open to students pursuing both undergraduate and graduate studies in an accredited college or university. LDC Data Scholarships are not restricted to any particular field of study; however, students must demonstrate a well-developed research agenda and a bona fide inability to pay.

The application consists of two parts:

(1) Data Use Proposal. Applicants must submit a proposal describing their intended use of the data. The proposal must contain the applicant's name, university, and field of study. The proposal should state which data the student plans to use and contain a description of their research project.  Students are advised to consult the LDC Corpus Catalog for a complete list of data distributed by LDC. Due to certain restrictions, a handful of LDC corpora are restricted to members of the Consortium.

(2) Letter of Support. Applicants must submit one letter of support from their thesis advisor or department chair. The letter must verify the student's need for data and confirm that the department or university lacks the funding to pay the full Non-member Fee for the data.

For further information on application materials and program rules, please visit the LDC Data Scholarship page.

Students can email their applications to the LDC Data Scholarship program. Decisions will be sent by email from the same address.

The deadline for the Fall 2010 program cycle is September 15, 2010.

New Providing Guidelines

LDC is pleased to announce that our Providing page has been recently updated and enhanced to reflect detailed guidelines for submitting corpora and other resources for publication by LDC. The new Providing page describes the entire process of sharing data through LDC from the initial publication inquiry to delivery of the data for publication. LDC's preferred submission formats for video, audio, and text data and directory structure, and best practices for file naming conventions are covered in depth.  The page also includes information on providing adequate metadata and documentation of your data set.

Researchers interested in publishing data through LDC are invited to use the Publication Inquiry Form.  The inquiry form will prompt you for basic information about your data including title, author, language, details on corpus size and format, as well as a description.  Once your inquiry has been received, our External Relations staff can assist you through each step of the publication process.

Why share your data through LDC?  Resources distributed by LDC reach a global audience. All published resources appear in LDC’s online Catalog, which is accessed daily by users worldwide. LDC’s monthly newsletter keeps the community abreast of all new publications, and its reach ensures the attention of interested researchers. LDC members receive copies of the corpora as part of their membership benefits. LDC’s Membership structure therefore guarantees your data greater exposure to major organizations working in human language technologies  and related fields.

The LDC Corpus Catalog contains a variety of resources in many languages and formats ranging from written to spoken and video. Speech and video data may derive from broadcast collections, interviews, and recordings of telephone conversations. Text data comes from a variety of sources including newswire, document archives and anthologies as well as the World Wide Web. LDC also publishes dictionaries and lexicons in a variety of languages.