![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
|||||||||||||||||||||||||||||||||||||||||||||
|
|
Recent Announcements from LDC
New LDC website coming soonLook for LDC's new website in the coming weeks. We've revamped the design and site plan to make it easier than ever to find what you're looking for. The features you use the most -- the catalog, new corpus releases and user login -- will be a short click away. We expect the LDC website to be occasionally unavailable for a few days at the end of September as we make the switch and thank you in advance for your understanding. Mixer 6 now availableThe release of Mixer 6 Speech this month marks the first time in close to a decade that LDC has made available a large-scale collection for training speaker recognition. Representing more than 15,000 hours of speech from over 500 speakers, Mixer 6 follows in the footsteps of the Switchboard and Fisher studies by providing a large database of telephone conversation and also adds interview and transcript reading sessions. Participants were native American English speakers local to the Philadelphia area. Mixer 6 Speech is a members-only release and a great reason to join the consortium. In addition to this substantial resource, members enjoy rights to other data released in 2013 and can license older publications at reduced fees. Please see the catalog page for Mixer 6 Speech. LDC at Interspeech 2013LDC will once again be exhibiting at Interspeech held this year August 25-29 in Lyon. Please stop by LDC’s booth to to learn about recent developments at the Consortium, including new publications. Also, be on the lookout for the following presentations:
Monday 26 August, Poster 6, 16.00 – 18.00 Room: Forum 6
Tuesday 27 August, Oral 17, 14.00 – 16.00 Room: Gratte-Ciel 3
Wednesday 28 August, Oral 32, 14.00 – 16.00 Room: Gratte-Ciel 3 LDC will continue to post conference updates via our Facebook page. We hope to see you there! High School students use LDC dataA team of students at Thomas Jefferson High School for Science and Technology in Alexandria, VA, USA, have used an LDC database for the development of a device to help autistic children recognize emotions. This team was funded by a grant from the Lemelson-MIT InvenTeam Initiative Program. InvenTeams are groups of high school students, teachers, and mentors that receive grants up to US$10,000 each to invent technological solutions to real-world problems. The team set out to invent an emotive aid in the form of a bracelet that uses a computational algorithm to extract emotional signatures from speech and display expressed emotions in real-time during a conversation. Potential beneficiaries include children with autism, Asperger’s syndrome, or similar diseases that impair the ability to detect emotion. The algorithm employed machine learning and neural network-based techniques to improve accuracy and efficiency relative to current methods. The students used speech samples from the LDC database, Emotional Prosody Speech and Transcripts (LDC2002S28) as well the Berlin Database of Emotional Speech for training and testing their algorithm. Although the samples proved to be too small to produce an algorithm with a high degree of accuracy, the team's algorithm did demonstrate some degree of success. The students will present their results at Eurekafest at MIT in June. LDC thanks the InvenTeam’s teacher, Mark Hannum, and group leader, Suhas Gondi, for contributing to this article. LDC at ICASSP 2013LDC will be at ICASSP 2013, the world's largest and most comprehensive technical conference focused on signal processing and its applications. The event will be held over May 26-31 and we look forward to interacting with members of this community at our exhibit table and during our poster and paper presentations: Tuesday, May 28, 15:30 - 17:30, Poster Area DARTICULATORY TRAJECTORIES FOR LARGE-VOCABULARY SPEECH RECOGNITION Authors: Vikramjit Mitra, Wen Wang, Andreas Stolcke, Hosung Nam, Colleen Richey, Jiahong Yuan (LDC), Mark Liberman (LDC)
Tuesday, May 28, 16:30 - 16:50, Room 2011
Wednesday, May 29, 15:20 - 17:20, Poster Area D
Please look for LDC's exhibition at Booth #53 in the Vancouver Convention Centre. We hope to see you there! Early renewing members save on feesTo date just over 100 organizations have joined for Membership Year (MY) 2013. For the sixth straight year, LDC's early renewal discount program has resulted in significant savings for our members. Organizations that renewed membership or joined early for MY2013 saved over US$50,000! MY 2012 members are still eligible for a 5% discount when renewing for MY2013. This discount will apply throughout 2013. Organizations joining LDC can take advantage of membership benefits including free membership year data as well as discounts on older LDC corpora. For-profit members can use most LDC data for commercial applications. Please visit our Members FAQfor further information. Commercial use and LDC dataHas your company obtained an LDC database as a non-member? For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. In the case of a small group of corpora such as American National Corpus (ANC) Second Release (LDC2005T35), Buckwalter Arabic Morphological Analyzer Version 2.0 (LDC2004L02), CELEX2 (LDC96L14) and all CSLU corpora, commercial licenses must be obtained separately from the owners of the data even if an organization is a for-profit member. Checking in with LDC Data Scholarship RecipientsThe LDC Data Scholarship program provides college and university students with access to LDC data at no-cost. Students are asked to complete an application which consists of a proposal describing their intended use of the data, as well as a letter of support from their thesis adviser. LDC introduced the Data Scholarship program during the Fall 2010 semester. Since that time, more than thirty individual students and student research groups have been awarded no-cost copies of LDC data for their research endeavors. These are student reports on their progress using LDC data:
LDC’s 20th Anniversary: Concluding a Year of CelebrationWe’ve enjoyed celebrating our 20th Anniversary this last year (April 2012 - March 2013) and would like to review some highlights before its close. Our 2012 User Survey, circulated early in 2012, included a special Anniversary section in which respondents were asked to reflect on their opinions of, and dealings with, LDC over the years. We were humbled by the response. Multiple users mentioned that they would not be able to conduct their research without LDC and its data. For a full list of survey testimonials, please click here. LDC also developed its first-ever timeline (initially published in the April 2012 Newsletter) marking significant milestones in the consortium’s founding and growth. In September, we hosted a 20th Anniversary Workshop that brought together many friends and collaborators to discuss the present and future of language resources. Throughout the year, we conducted several interviews of long-time LDC staff members to document their unique recollections of LDC history and to solicit their opinions on the future of the Consortium. These interviews are available as podcasts on the LDC Blog. As our Anniversary year draws to a close, one task remains – to thank all of LDC’s past, present and future members and other friends of the Consortium for their loyalty and for their contributions to the community. LDC would not exist if not for its supporters. The variety of relationships that LDC has built over the years is a direct reflection of the vitality, strength and diversity of the community. We thank you all and hope that we continue to serve your needs in our third decade and beyond. For a last treat, please visit LDC’s newly-launched YouTube channel to enjoy this video montage of the LDC staff interviews featured in the podcast series. Thank you again for your continued support! Spring 2013 LDC Data Scholarship RecipientsLDC is pleased to announce the student recipients of the Spring 2013 LDC Data Scholarship program! This program provides university students with access to LDC data at no-cost. Students were asked to complete an application which consisted of a proposal describing their intended use of the data, as well as a letter of support from their thesis adviser. We received many solid applications and have chosen three proposals to support. The following students will receive no-cost copies of LDC data:
Please join us in congratulating our student recipients! The next LDC Data Scholarship program is scheduled for the Fall 2013 semester. Membership Fee Savings and Publications PipelineTime is quickly running out to save on membership fees for Membership Year 2013 (MY2013)! Any organization which joins or renews membership for 2013 through Friday, March 1, 2013, is entitled to a 5% discount on membership fees. Organizations which held membership for MY2012 can receive a 10% discount on fees provided they renew prior to March 1, 2013. Many publications for MY2013 are still in development. The planned publications for the upcoming months include:
2013 Subscription Members are automatically sent all MY2013 data as it is released. 2013 Standard Members are entitled to request 16 corpora for free from MY2013. Non-members may license most data for research use. See below for further information on pricing. Invitation to Join for Membership Year 2013Membership Year (MY) 2013 is open for joining! We would like to invite all current and previous members of LDC to renew their membership as well as welcome new organizations to join the consortium. For MY2013, LDC is pleased to maintain membership fees at last year’s rates – membership fees will not increase. Additionally, LDC will extend discounts on membership fees to members who keep their membership current and who join early in the year. The details of our early renewal discounts for MY2013 are as follows:
The following table provides exact pricing information
Publications for MY2013 are still being planned; here are the working titles of data sets we intend to provide:
In addition to receiving new publications, current year members of the LDC also enjoy the benefit of licensing older data at reduced costs; current year for-profit members may use most data for commercial applications. This past year, LDC members who joined early or kept their membership current saved almost US$70,000 collectively on membership fees. Be sure to keep an eye on your mail - all previous and current LDC members will be sent an invitation to join letter and renewal invoice for MY2013. Renew early for MY2013 to save today! Why become an LDC Member?LDC is offering early renewal discounts on membership fees for Membership Year 2013 making now a good time to consider joining or renewing membership. LDC membership has the following advantages:
For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations, including commercial restrictions, on the use of certain corpora. In the case of a small group of corpora, commercial licenses must be obtained separately from the owners of the data. 2012 User Survey ResultsEarlier this year, LDC sent a survey to its user communities. Like previous iterations in 2006 and 2007, the survey solicited community input and suggestions on key LDC-related topics, including:
Survey respondents were generally satisfied with LDC’s data, membership options, homepage and Catalog, though there were requests for additional data options and data acquisition methods. Some of the data respondents requested are already in our pipeline for the end of 2012 or for Membership Year (MY) 2013, so please be on the lookout for Publications updates. Respondents were also very supportive of LDC’s 20th Anniversary, posting testimonials and well-wishes in the 20th Anniversary section. LDC would like to thank all survey participants. Survey participants will receive access to full survey results shortly. Fall 2012 LDC Data Scholarship RecipientsLDC is pleased to announce the student recipients of the Fall 2012 LDC Data Scholarship program! This program provides university and college students with access to LDC data at no-cost. Students were asked to complete an application which consisted of a proposal describing their intended use of the data, as well as a letter of support from their thesis adviser. We received many solid applications and have chosen six proposals to support. The following students will receive no-cost copies of LDC data:
LDC Exhibiting at NWAV 41LDC will be exhibiting at the 41st New Ways of Analyzing Variation Conference (NWAV 41) in late October. This marks the fifth time that LDC has been an NWAV exhibitor and we are proud to show our continued support of the sociolinguistic research community. The conference runs from October 25-28 and the exhibition hall will be open from October 26-28, 2012. Please stop by to say hello! LDC 20th Anniversary Workshop Wrap-upIn early September, LDC hosted a workshop entitled The Future of Language Resources in celebration of our 20th anniversary. Visit the Program page to browse speaker abstracts and to access pdfs of the presentations. Thanks to the speakers and attendees for making the workshop a success! LDC 20th Anniversary PodcastsTo further celebrate our 20th Anniversary, LDC is conducting interviews of long-time staff members for their unique perspectives on the Consortium’s growth and evolution over the past two decades. The first interview podcast debuts this month and features Dave Graff, LDC’s Lead Programmer. Visit the LDC blog to access the podcast. Other podcasts will be published via the LDC blog, so stay tuned to that space. Language Resource WikiThe Language Resource Wiki catalogs data, software, descriptive grammars and other resources for a variety of languages but especially those with a paucity of generally available resources for research. LDC is actively seeking editors knowledgeable in these and other languages to develop and maintain the pages, which are readable by anyone but writable only by editors. The wiki currently has resource listings for: Bengali, Berber, Breton, Ewe, Greek (Ancient), Indonesian, Hindi, Latin, Panjabi, Pashto, Sorani (Central Kurdish), Russian, Tagalog, Tamil, and Urdu, and for the following Sign Languages: American, British, Catalan, Dutch, Flemish, German, Japanese, New Zealand, Polish, Spanish, and Swiss German. LDC and Google Collaboration Results in New Syntactically-Annotated Language ResourcesGoogle Inc. and the Linguistic Data Consortium (LDC) have collaborated to develop new syntactically-annotated language resources that enable computers to better understand human language. The project, funded through a gift from Google in 2010, has resulted in the development of the English Web Treebank LDC2012T13 containing over 250,000 words of weblogs, newsgroups, email, reviews and question-answers manually annotated for syntactic structure. This resource will allow language technology researchers to develop and evaluate the robustness of parsing methods in various new web domains. It was used in the 2012 shared task on parsing English web text for the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL) which took place at NAACL-HLT in Montreal on June 8, 2012. The English Web Treebank is available to the research community through LDC’s Catalog. Natural language processing (NLP) is a field of computational linguistic research concerned with the interactions between human language and computers. Parsing is a discipline within NLP in which computers analyze text and determine its syntactic structure. While syntactic parsing is already practically useful, Google funded this effort to help the research community develop better parsers for web text. The web texts collected and annotated by LDC provide new, diverse data for training parsing systems. Google chose LDC for this work based on the Consortium’s experience in developing and creating syntactic annotations, also known as treebanks. Treebanks are critically important to parsing research since they provide human-analyzed sentence structures that facilitate training and testing scenarios in NLP research. This work extends the existing relationship between LDC and Google. LDC has published four other Google-developed data sets in the past six years: English, Chinese, Japanese and European language n-grams used principally for language modeling. |
||||||||||||||||||||||||||||||||||||||||||||
|
About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data
Contact ldc@ldc.upenn.edu |
|||||||||||||||||||||||||||||||||||||||||||||