![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
|||||||||||||||||||||||||||||||||||||||
|
|
Recent Announcements from LDC
LDC Celebrates its 20th Anniversary!2012 marks LDC’s 20th Anniversary year – officially on April 15 – but this is cause for a yearlong celebration! From our founding in 1992 as a data repository and language resource distribution center, our online catalog has grown to include over 500 databases in 60 languages that have been licensed by over 3000 organizations from 80 different nations. This data has been made available through donations, funded projects at LDC or elsewhere, community initiatives, and from LDC resources, an indication of the collective strength of this consortium. And, LDC has evolved from an organization that shares language resources to one that also is at the forefront of language technology research that includes the development of new data resources, software tools, and standards and best practices. As we celebrate throughout the year, look for announcements and special features in our newsletter and on our Facebook page. 2012 LDC Survey – Be on the Lookout!It’s been four years since our last survey of LDC members and data licensees and we would like to again ask you to share your views on LDC and its language resources as well as your thoughts about data distribution in general and the impact of social media on language-related research and technology development. These topics are particularly timely as LDC enters its 20th anniversary year. The 2012 LDC Survey will be sent to every person and organization that licensed LDC data and/or joined LDC as a Member during the period from 2009 through 2011. Those who complete the survey on or before February 7, 2012 will make their organization eligible for a $500 benefit to be applied to any corpus or membership purchase in 2012. LDC will conduct a blind drawing and one lucky winner will be chosen from the pool of respondents.Many thanks for your continued support and for your participation in the 2012 Survey! LDC Exhibiting at LSA 2012 Annual MeetingLDC looks forward to mingling with linguists and language specialists when we exhibit at the 86th Annual Meeting of the Linguistic Society of America (LSA). The main conference will be held over January 5-8, 2012 at the Portland, OR Hilton and Executive Tower and the exhibit hall will be open from January 6-8th (limited hours on Sunday the 8th). Please stop by our display for news on what 2012 will hold for LDC and to receive some of our conference giveaways. LSA 2012 will feature plenary talks on the following topics:
For further information visit the LSA Annual Meeting website. If you would like to learn more about LDC’s conference preparations, please ‘like’ our Facebook page. We hope to see you there! LDC Hosts Satellite Workshop at LSA 2012LDC will co-host a satellite workshop entitled Sociolinguistic Archival Preparation on January 4-5, 2012 in conjunction with the LSA 2012 Annual Meeting in Portland, OR. This two-day workshop will focus on techniques to permit the archiving of data, for cross-community sharing of corpora as well as for subsequent 'panel' studies. Recent discussions within the field have concluded that present protocols need to be expanded to permit adequate archiving. Specifically:
The sooner IRB forms and research protocols are aligned with each other, the sooner sharable, archiveable corpora will become available, permitting intergroup comparison and interdisciplinary collaboration. LDC's Executive Director, Christopher Cieri, and LDC consultant and University of Arizona scholar, Malcah Yaeger-Dror, are the workshop organizers. This workshop is funded in part by the National Science Foundation (BCS#1144480). Further information about the workshop is available on the LSA Annual Meeting website. Invitation to join for Membership Year (MY) 2012Membership Year (MY) 2012, our 20th Anniversary Year, is open for joining! We would like to invite all current and previous members of LDC to renew their membership as well as welcome new organizations to join the consortium. For MY2012, LDC is pleased to maintain membership fees at last year’s rates – membership fees will not increase. Additionally, LDC will extend discounts on membership fees to members who keep their membership current and who join early in the year. The details of our early renewal discounts for MY2012 are as follows:
The following table provides exact pricing information.
* For new members, MY2011 Members renewing for MY2012, and any previous year Member who renews before March 1, 2012 ** For MY2011 Members renewing before March 1, 2012 Publications for MY2012 are still being planned; here are the working titles of data sets we intend to provide: In addition to receiving new publications, current year members of the LDC also enjoy the benefit of licensing older data at reduced costs; current year for-profit members may use most data for commercial applications. This past year, LDC members who joined early or kept their membership current saved almost US$70,000 collectively on membership fees. Be sure to keep an eye on your mail - all previous and current LDC members will be sent an invitation to join letter and renewal invoice for MY2012. Renew early for MY2012 to save today! Why become an LDC member?LDC is offering early renewal discounts on membership fees for Membership Year 2012 making now a good time to consider joining or renewing membership. LDC membership has the following advantages: For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations, including commercial restrictions, on the use of certain corpora. In the case of a small group of corpora, commercial licenses must be obtained separately from the owners of the data. Fall 2011 LDC Data Scholarships recipientsLDC is pleased to announce the student recipients of the Fall 2011 LDC Data Scholarship program! The LDC Data Scholarship program provides university students with access to LDC data at no-cost. Data scholarships are offered twice a year to correspond to the Fall and Spring semesters. Students are asked to complete an application which consists of a data use proposal and letter of support from their academic adviser. LDC received many strong applications from students attending universities across the globe. We've reviewed all the applications, and after careful consideration, we have selected four scholarship recipients! These students will receive no-cost copies of LDC data:
Please join us in congratulating our student recipients! Look for our upcoming announcements about the submissions deadlines for the Spring 2012 LDC Data Scholarship program. LDC data - now on Blu-ray!LDC is pleased to announce that the Blu-ray revolution has come to linguistic data! We are now offering select databases on Blu-ray Disc (BD). With BD, we'll be able to distribute some of our larger databases using a smaller number of discs. BDs also have the potential to be read more quickly than DVDs, which means faster access to data. To introduce our Blu-ray option, we would like to announce that the following databases will now be distributed on BD in addition to DVD-ROM:
Organizations with licenses to English Gigaword Fifth Edition will be given the opportunity to 'swap' their DVDs for BDs. New licensees for Web 1T 5-gram have the option to select BD or DVD media. We expect to extend the BD option over time to other corpora in the catalog and to new releases. LDC at NWAV 2011NWAV’s 40th Anniversary Conference will be hosted by Georgetown University from October 27-30 and LDC will be on-hand to celebrate! Please stop by the LDC exhibition at any point during the main conference and be sure to attend LDC’s pre-conference workshop on “Demographic Coding for Sociolinguistic Corpus Archive Preparation” from 4.00 – 6.00 pm on Thursday, October 27. This workshop will be hosted by LDC Executive Director Christopher Cieri and Malcah Yaeger-Dror of the University of Arizona. It has two stated goals:
NWAV registration options can be foundhere. We hope to see you there! Please visit LDC’s Facebook page to follow our conference activities. Cataloging the communication of Asian ElephantsLDC distributes a broad selection of databases, the majority of which are used for human language research and technology development. Our corpus catalog also includes the vocalizations of other animal species. We'd like to highlight the intriguing work behind one such animal communication corpus, Asian Elephant Vocalizations LDC2010S05. Asian Elephant Vocalizations contains audio recordings of vocalizations by Asian Elephants (Elephas maximus) in Uda Walawe National Park, Sri Lanka. The data was collected by Shermin de Silva as part of her doctoral thesis at the University of Pennsylvania. Recordings were made using a Fostex field recorder with a Sennheiser 'shot-gun' microphone. In addition, de Silva utilized a second dictation microphone that allows observers to narrate what's happening without talking over the elephant recording. The digital files were then downloaded and visualized using the Praat TextGrid Editor, a tool originally developed for studying human speech which has since been adopted by elephant researchers. With Praat, trained annotators are able to characterize call types and extract particular segments for later analysis. Until recently, the majority of research on the behavior of wild elephants focused on one species - the African savannah elephant. There has been comparatively less study of communication in Asian elephants, primarily because the habitat in which Asian elephants typically live makes them more difficult to study than African forest elephants. Asian and African elephants diverged from one another approximately six million years ago and evolved separately in very distinct environments. de Silva's work has shown that Asian elephants have highly dynamic social lives, that are markedly different from that of African elephants. Asian elephants tend to form smaller, fragmented groups on a day-to-day basis but maintain long-term pools of companions over many years. Because communication in elephants appears to be largely socially-motivated, differences in social behavior and ecology may also be a source of differences in their vocal behavior and repertoire. de Silva and her colleagues study elephant communication as an opportunity to understand the evolution of social behavior and communication in a system that is very different from our own primate experience. Human language is only one manifestation of communication in the natural world. Perhaps this is why it is fitting to place animal vocalizations side-by-side with human speech in LDC's catalog. In this way, we can better understand how human language relates to the communicative capabilities of other species. For further information on Shermin de Silva's current research at the Elephant Forest and Environment Conservation Trust visit: Web|Blog Checking in with previous LDC Data Scholarship recipientsLDC introduced the Data Scholarship program during the Fall 2010 semester. Since that time, more than fifteen individual students and student research groups have been awarded no-cost copies of LDC data for their research endeavors. Here is an update on the work of a few of our student recipients: We'd like to thanks these students for providing an update on their research. Stay tuned for further reports from other data scholarship recipients. Weizmann Institute students are introduced to LDC dataLDC data was featured in an introductory speech recognition course at the Weizmann Institute of Science in Rehovot, Israel. Visiting professor, Karen Livescu, of Toyota Technological Institute at Chicago and University of Chicago, Department of Computer Science used several LDC corpora, including CSR-I (WSJ0) Complete (LDC93S6A), Switchboard-1 Release 2 (LDC97S62), TIDIGITS (LDC93S10), and TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) for homework and term projects, with a few examples shown during in-class demonstration. The students enrolled in the course were computer science and mathematics graduate students and all were new to automatic speech recognition (ASR). They had backgrounds in probability, but no significant experience with the probabilistic models used in ASR, such as hidden Markov models and Gaussian mixtures. Livescu provided baseline recognizers that the students could modify, so that even beginning students could focus on specific components, while using real data with results in the literature to compare against. Since the students were provided with real data that the research community actively uses, students were motivated by the potential for 'real' results if their projects went as planned. As Livescu noted, 'while starting out in ASR from scratch is very difficult, the availability of toolkits and LDC data makes it possible for students in an introductory class to do productive research quite quickly'. Many thanks to Karen Livescu for sharing an example of how LDC data can be used for teaching purposes. LDC Exhibiting at Interspeech 2011, Florence ItalyLDC is returning to Europe to participate in Interspeech 2011. The conference will be held from August 27-31 at the Firenze Fiera, conveniently located near the Stazione di Santa Maria Novella. Please stop by LDC’s exhibition booth to say hello and learn more about current happenings at the Consortium. Interspeech 2011’s theme is ‘Speech Science and Technology for Real Life’. The main conference will feature keynotes on the following topics: Conference organizers have also scheduled a roundtable discussion for August 31st on ‘Future and Applications of Speech and Language Technologies for the Good Health of Society’ which will be led by Profs. Gabriele Miceli, Björn Granström and Hiroshi Ishiguro. You are encouraged to keep track of LDC’s Interspeech preparations on our Facebook page. We hope to see you there!
LDC Sponsors Student Team at IOL 2011LDC is happy to support the 2011 International Linguistics Olympiad by sponsoring a student team. The IOL is one of the twelve International Science Olympiads and is an annual event that brings together students from around the world to compete in linguistically–based challenges. This year’s competition takes place from July 24-30 at Carnegie Mellon University, Pittsburgh, PA USA. Students do not need to have a background in linguistics in order to participate since they typically use analysis and deductive reasoning to solve the competition problems. Please visit the IOL 2011 website for additional details. We wish good luck to all of the participants! LDC Receives META Prize from META-NETLDC was awarded a '2nd META Prize' from META-NET ‘for outstanding long term commitment to the preparation and distribution of language resources and technologies'.
The META Prize is awarded by META-NET to those who provide outstanding products or services that support the European Multilingual Information Society. META-NET is a Network of Excellence dedicated to fostering the technological foundations of a multilingual European information society. Several organizations were honored at this year’s META Forum in Budapest; LDC and ELRA were both honored for supporting and developing language resources. LDC at ACL 2011ACL has returned to North America and LDC is taking this opportunity to interact with top HLT researchers in beautiful Portland, OR. LDC’s exhibition table will feature information on new developments at the consortium and will also be the go-to point for exciting new, green giveaways. LDC’s Seth Kulick will be presenting research on ‘Using Derivation Trees for Treebank Error Detection’ (S-66) during Monday’s evening poster session (20 June, 6.00 – 8.30 pm). The abstract for this paper, coauthored by LDCers Ann Bies and Justin Mott, is as follows: This work introduces a new approach to checking treebank consistency. Derivation trees based on a variant of Tree Adjoining Grammar are used to compare the annotation of word sequences based on their structural similarity. This overcomes the problems of earlier approaches based on using strings of words rather than tree structure to identify the appropriate contexts for comparison. We report on the result of applying this approach to the Penn Arabic Treebank and how this approach leads to high precision of error detection. We hope to see you there. LDC and Social NetworksOver the past few months, LDC has responded to requests from the community to increase our online presence. We are happy to announce that LDC now has its very own Facebook page, LinkedIn profile (independent of the University of Pennsylvania) and Blog, which provides an RSS feed for LDC newsletters. Please visit LDC on our various profiles and let us know what you think! |
||||||||||||||||||||||||||||||||||||||
|
About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data
Contact ldc@ldc.upenn.edu |
|||||||||||||||||||||||||||||||||||||||