Recent Announcements from LDC
LDC will be at ICASSP 2013, the world's largest and most comprehensive technical conference focused on signal processing and its applications. The event will be held over May 26-31 and we look forward to interacting with members of this community at our exhibit table and during our poster and paper presentations:Tuesday, May 28, 15:30 - 17:30, Poster Area D
ARTICULATORY TRAJECTORIES FOR LARGE-VOCABULARY SPEECH RECOGNITION
Authors: Vikramjit Mitra, Wen Wang, Andreas Stolcke, Hosung Nam, Colleen Richey, Jiahong Yuan (LDC), Mark Liberman (LDC)
Tuesday, May 28, 16:30 - 16:50, Room 2011
Wednesday, May 29, 15:20 - 17:20, Poster Area D
Please look for LDC's exhibition at Booth #53 in the Vancouver Convention Centre. We hope to see you there!
To date just over 100 organizations have joined for Membership Year (MY) 2013. For the sixth straight year, LDC's early renewal discount program has resulted in significant savings for our members. Organizations that renewed membership or joined early for MY2013 saved over US$50,000! MY 2012 members are still eligible for a 5% discount when renewing for MY2013. This discount will apply throughout 2013.
Organizations joining LDC can take advantage of membership benefits including free membership year data as well as discounts on older LDC corpora. For-profit members can use most LDC data for commercial applications. Please visit our Members FAQfor further information.
Has your company obtained an LDC database as a non-member? For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. In the case of a small group of corpora such as American National Corpus (ANC) Second Release (LDC2005T35), Buckwalter Arabic Morphological Analyzer Version 2.0 (LDC2004L02), CELEX2 (LDC96L14) and all CSLU corpora, commercial licenses must be obtained separately from the owners of the data even if an organization is a for-profit member.
The LDC Data Scholarship program provides college and university students with access to LDC data at no-cost. Students are asked to complete an application which consists of a proposal describing their intended use of the data, as well as a letter of support from their thesis adviser. LDC introduced the Data Scholarship program during the Fall 2010 semester. Since that time, more than thirty individual students and student research groups have been awarded no-cost copies of LDC data for their research endeavors. These are student reports on their progress using LDC data:
We’ve enjoyed celebrating our 20th Anniversary this last year (April 2012 - March 2013) and would like to review some highlights before its close.
Our 2012 User Survey, circulated early in 2012, included a special Anniversary section in which respondents were asked to reflect on their opinions of, and dealings with, LDC over the years. We were humbled by the response. Multiple users mentioned that they would not be able to conduct their research without LDC and its data. For a full list of survey testimonials, please click here.
LDC also developed its first-ever timeline (initially published in the April 2012 Newsletter) marking significant milestones in the consortium’s founding and growth.
In September, we hosted a 20th Anniversary Workshop that brought together many friends and collaborators to discuss the present and future of language resources.
Throughout the year, we conducted several interviews of long-time LDC staff members to document their unique recollections of LDC history and to solicit their opinions on the future of the Consortium. These interviews are available as podcasts on the LDC Blog.
As our Anniversary year draws to a close, one task remains – to thank all of LDC’s past, present and future members and other friends of the Consortium for their loyalty and for their contributions to the community. LDC would not exist if not for its supporters. The variety of relationships that LDC has built over the years is a direct reflection of the vitality, strength and diversity of the community. We thank you all and hope that we continue to serve your needs in our third decade and beyond.
For a last treat, please visit LDC’s newly-launched YouTube channel to enjoy this video montage of the LDC staff interviews featured in the podcast series.
Thank you again for your continued support!
LDC is pleased to announce the student recipients of the Spring 2013 LDC Data Scholarship program! This program provides university students with access to LDC data at no-cost. Students were asked to complete an application which consisted of a proposal describing their intended use of the data, as well as a letter of support from their thesis adviser. We received many solid applications and have chosen three proposals to support. The following students will receive no-cost copies of LDC data:
Please join us in congratulating our student recipients! The next LDC Data Scholarship program is scheduled for the Fall 2013 semester.
Time is quickly running out to save on membership fees for Membership Year 2013 (MY2013)! Any organization which joins or renews membership for 2013 through Friday, March 1, 2013, is entitled to a 5% discount on membership fees. Organizations which held membership for MY2012 can receive a 10% discount on fees provided they renew prior to March 1, 2013.
Many publications for MY2013 are still in development. The planned publications for the upcoming months include:
2013 Subscription Members are automatically sent all MY2013 data as it is released. 2013 Standard Members are entitled to request 16 corpora for free from MY2013. Non-members may license most data for research use. See below for further information on pricing.
Membership Year (MY) 2013 is open for joining! We would like to invite all current and previous members of LDC to renew their membership as well as welcome new organizations to join the consortium. For MY2013, LDC is pleased to maintain membership fees at last year’s rates – membership fees will not increase. Additionally, LDC will extend discounts on membership fees to members who keep their membership current and who join early in the year.
The details of our early renewal discounts for MY2013 are as follows:
The following table provides exact pricing information
Publications for MY2013 are still being planned; here are the working titles of data sets we intend to provide:
In addition to receiving new publications, current year members of the LDC also enjoy the benefit of licensing older data at reduced costs; current year for-profit members may use most data for commercial applications.
This past year, LDC members who joined early or kept their membership current saved almost US$70,000 collectively on membership fees. Be sure to keep an eye on your mail - all previous and current LDC members will be sent an invitation to join letter and renewal invoice for MY2013. Renew early for MY2013 to save today!
LDC is offering early renewal discounts on membership fees for Membership Year 2013 making now a good time to consider joining or renewing membership. LDC membership has the following advantages:
For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations, including commercial restrictions, on the use of certain corpora. In the case of a small group of corpora, commercial licenses must be obtained separately from the owners of the data.
Earlier this year, LDC sent a survey to its user communities. Like previous iterations in 2006 and 2007, the survey solicited community input and suggestions on key LDC-related topics, including:
Survey respondents were generally satisfied with LDC’s data, membership options, homepage and Catalog, though there were requests for additional data options and data acquisition methods. Some of the data respondents requested are already in our pipeline for the end of 2012 or for Membership Year (MY) 2013, so please be on the lookout for Publications updates. Respondents were also very supportive of LDC’s 20th Anniversary, posting testimonials and well-wishes in the 20th Anniversary section.
LDC would like to thank all survey participants. Survey participants will receive access to full survey results shortly.
LDC is pleased to announce the student recipients of the Fall 2012 LDC Data Scholarship program! This program provides university and college students with access to LDC data at no-cost. Students were asked to complete an application which consisted of a proposal describing their intended use of the data, as well as a letter of support from their thesis adviser. We received many solid applications and have chosen six proposals to support. The following students will receive no-cost copies of LDC data:
LDC will be exhibiting at the 41st New Ways of Analyzing Variation Conference (NWAV 41) in late October. This marks the fifth time that LDC has been an NWAV exhibitor and we are proud to show our continued support of the sociolinguistic research community.
The conference runs from October 25-28 and the exhibition hall will be open from October 26-28, 2012. Please stop by to say hello!
In early September, LDC hosted a workshop entitled The Future of Language Resources in celebration of our 20th anniversary.
Visit the Program page to browse speaker abstracts and to access pdfs of the presentations. Thanks to the speakers and attendees for making the workshop a success!
To further celebrate our 20th Anniversary, LDC is conducting interviews of long-time staff members for their unique perspectives on the Consortium’s growth and evolution over the past two decades. The first interview podcast debuts this month and features Dave Graff, LDC’s Lead Programmer. Visit the LDC blog to access the podcast.
Other podcasts will be published via the LDC blog, so stay tuned to that space.
The Language Resource Wiki catalogs data, software, descriptive grammars and other resources for a variety of languages but especially those with a paucity of generally available resources for research. LDC is actively seeking editors knowledgeable in these and other languages to develop and maintain the pages, which are readable by anyone but writable only by editors. The wiki currently has resource listings for: Bengali, Berber, Breton, Ewe, Greek (Ancient), Indonesian, Hindi, Latin, Panjabi, Pashto, Sorani (Central Kurdish), Russian, Tagalog, Tamil, and Urdu, and for the following Sign Languages: American, British, Catalan, Dutch, Flemish, German, Japanese, New Zealand, Polish, Spanish, and Swiss German.
Google Inc. and the Linguistic Data Consortium (LDC) have collaborated to develop new syntactically-annotated language resources that enable computers to better understand human language. The project, funded through a gift from Google in 2010, has resulted in the development of the English Web Treebank LDC2012T13 containing over 250,000 words of weblogs, newsgroups, email, reviews and question-answers manually annotated for syntactic structure. This resource will allow language technology researchers to develop and evaluate the robustness of parsing methods in various new web domains. It was used in the 2012 shared task on parsing English web text for the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL) which took place at NAACL-HLT in Montreal on June 8, 2012. The English Web Treebank is available to the research community through LDC’s Catalog.
Natural language processing (NLP) is a field of computational linguistic research concerned with the interactions between human language and computers. Parsing is a discipline within NLP in which computers analyze text and determine its syntactic structure. While syntactic parsing is already practically useful, Google funded this effort to help the research community develop better parsers for web text. The web texts collected and annotated by LDC provide new, diverse data for training parsing systems.
Google chose LDC for this work based on the Consortium’s experience in developing and creating syntactic annotations, also known as treebanks. Treebanks are critically important to parsing research since they provide human-analyzed sentence structures that facilitate training and testing scenarios in NLP research. This work extends the existing relationship between LDC and Google. LDC has published four other Google-developed data sets in the past six years: English, Chinese, Japanese and European language n-grams used principally for language modeling.
LDC announces its 20th Anniversary Workshop on Language Resources, to be held in Philadelphia on September 6-7, 2012. The event will commemorate our anniversary, reflect on the beginning of language data centers and address the future of language resources.
Workshop themes will include: the developments in human language technologies and associated resources that have brought us to our current state; the language resources required by the technical approaches taken and the impact of these resources on HLT progress; the applications of HLT and resources to other disciplines including law, medicine, economics, the political sciences and psychology; the impact of HLTs and related technologies on linguistic analysis and novel approaches in fields as widespread as phonetics, semantics, language documentation, sociolinguistics and dialect geography; and finally, the impact of any of these developments on the ways in which language resources are created, shared and exploited and on the specific resources required.
Visit the LDC 20th Anniversary Workshop page for further details.
LDC attended the 8th Language Resource Evaluation Conference (LREC2012), hosted by ELRA, the European Language Resource Association. The conference was held in Istanbul, Turkey and featured a broad range of sessions on language resource and human language technologies research. Fourteen LDC staff members presented current work on a wide range of topics, including handwriting recognition, word alignment, treebanks, machine translation and information retrieval as well as initiatives for synchronizing metadata practices in sociolinguistic data collection.
The LDC Papers page now includes research papers presented at LREC 2012. Most papers are available for download in pdf format; presentations slides and posters are available for several papers as well. On the Papers page, you can read about LDC's role in resource creation to support handwriting recognition and translation technology (Song et al 2012). LDC is developing resources to support two research programs: Multilingual Automatic Document Classification, Analysis and Translations (MADCAT) and Open Handwriting Recognition and Translation (OpenHaRT). To support these programs, LDC is collecting handwritten samples of pre-processed Arabic and Chinese data that had previously been translated into English. To date, LDC has collected and annotated over 225,000 handwriting images.
Additionally, you can learn about LDC's efforts to collect and annotate very large corpora of user-contributed content in multiple languages (Garland et al, 2012). For the Broad Operational Language Translation (BOLT) program, LDC is developing resources to support genre-independent machine translation and information retrieval systems. In the current phase of BOLT, LDC is collecting and annotating threaded posts from online discussion forums, targeting at least 500 millions words each in three languages: English, Chinese, and Egyptian Arabic. A portion of the data undergoes manual, multi-layered linguistic annotation.
As we mark LDC's 20th anniversary, we will feature the work behind these LREC papers as well as other ongoing research in upcoming newsletters.
About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data