![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
|||||||||||||||||||||||||||||||||
|
|
Recent Announcements from the LDC
2007 Member Survey ResponsesPlease click here to access a summary of the responses to Questions 1-15 of the 2007 Member Survey. These questions were sent to all survey recipients. We also received many suggestions for future releases, among them:
Some of those requests represent data in our 2008 publications pipeline. The winner of the blind drawing for the $500 benefit for survey responses received by January 14, 2008 is Richard Rose of McGill University. Congratulations! To all survey respondents: As promised, a more detailed analysis of the survey will be arriving within the next few weeks. Stay tuned!
2008 Publications PipelineMembership Year (MY) 2008 is shaping up to be another productive one for the LDC. We anticipate releasing a balanced and exciting selection of publications. Here is a glimpse of what is in the pipeline for MY2008. (Disclaimer: unforeseen circumstances may lead to modifications of our plans. Please regard this list as tentative). • BLLIP 1994-1997 News Text Release 1 - automatic parses for the North American News Text Corpus - NANT (LDC95T21). The parses were generated by the Charniak and Johnson Reranking Parser which was trained on Wall Street Journal (WSJ) data from Treebank 3 (LDC99T42). Each file is a sequence of n-best lists containing the top n parses of each sentence with the corresponding parser probability and reranker score. The parses may be used in systems that are trained off labeled parse trees but require more data than found in WSJ. Two versions will be released: a complete 'Members-Only' version which contains parses for the entire NANT Corpus and a 'Non Member' version for general licensing which includes all news text except data from the Wall Street Journal. • Chinese Proposition Bank - the goal of this project is to create a corpus of text annotated with information about basic semantic propositions. Predicate-argument relations are being added to the syntactic trees of the Chinese Treebank Data. This release contains the predicate-argument annotation of 81,009 verb instances (11,171 unique verbs) and 14,525 noun instances (1,421 unique nouns). The annotation of nouns are limited to nominalizations that have a corresponding verb. • GALE Phase 1 Arabic Newsgroup Parallel Text - contains a total of 178K words (264 files) of Arabic newsgroup text selected from 35 sources. Newsgroups consist of posts to electronic bulletin boards, Usenet newsgroups, discussion groups and similar forums. Manual sentence units/segments (SU) annotation was also performed on a subset of files following LDC's Quick Rich Transcription specification. Files were translated according to LDC's GALE Translation guidelines. • GALE Phase 1 Chinese Newsgroup Parallel Text - contains a total of 240K characters (112 files) of Chinese newsgroup text selected from 25 sources. Newsgroups consist of posts to electronic bulletin boards, Usenet newsgroups, discussion groups and similar forums. Manual sentence units/segments (SU) annotation was also performed on a subset of files following LDC's Quick Rich Transcription specification. Files were translated according to the LDC's GALE Translation guidelines. • Hindi WordNet - first wordnet for an Indian language. Similar in design to the Princeton Wordnet for English, it incorporates additional semantic relations to capture the complexities of Hindi. The WordNet contains 28604 synsets and 63436 unique words. Created by the NLP group at Indian Institute of Technology Bombay, it is inspiring construction of wordnets for many other Indian languages, notably Marathi. • LCTL Bengali Language Pack - a set of linguistic resources to support technological improvement and development of new technology for the Bengali language created in the Less Commonly Taught Languages (LCTL) project which covered a total of 13 languages. Package components are: 2.6 million tokens of monolingual text, 500,000 tokens of parallel text, a bilingual lexicon with 48,000 entries, sentence and word segmenting tools, an encoding converter, a part of speech tagger, a morphological analyzer, a named entity tagger and 136,000 tokens of named entity tagged text, a Bengali-to-English name transliterator, and a descriptive grammar created by a PhD research linguist. About 30,000 tokens of the parallel text are English-to-LCTL translations of a "Common Subset" corpus, which will be included in all additional LCTL Language Packs. • North American News Text Corpus (NANT) Reissue - as a companion to BLLIP 1994-1997 News Text Release 1, LDC will reissue the North American News Text Corpus (LDC95T21). Data includes news text articles from several sources (L.A.Times/Washington Post, Reuters General News, Reuters Financial News, Wall Street Journal, New York Times) that has been formatted with TIPSTER-style SGML tags to indicate article boundaries and organization of information within each article. Two versions will be released: a complete 'Members-Only' version which contains all previously released NANT articles and a 'Non Member' version for general licensing which includes all news text except data from the Wall Street Journal. As a reminder, MY2007 will remain open for joining through December 31, 2008 and MY2008 through December 31, 2009. Take note that some of our current discounts on Membership Fees will be no longer be effective after March 1, 2008.
50,000th LDC Corpus Distributed!Last year marked the LDC's 15th Anniversary Year and it proved to be an exciting one for the LDC. We commemorated this anniversary with a Fidelity Celebration which rewarded our loyal members who continually support the consortium through membership. Additionally, we provided our list serve readers with a glimpse into the research activities at the LDC through each of our monthly Spotlights. At the very end of our anniversary year, the LDC observed another significant milestone: the distribution of our 50,000th publication! This corpus was licensed by Helsinki University of Technology, Adaptive Informatics Research Centre (AIRC). AIRC's research includes basic algorithmic analysis, multimodal interfaces (speech, vision and language), bioinformatics, neuroinformatics and computational cognitive systems. In appreciation, the LDC is offering Helsinki University of Technology a US$2000 benefit to be used towards membership or data licensing fees. We would like to thank both members and nonmembers for helping the LDC reach this landmark distribution. Your persistent demand for LDC data supports our mission to develop and share resources for research in human language technologies.
Membership Fee Increases and DiscountsEffective January 1, 2008, the LDC will be raising membership fees for the first time in fifteen years. Note that the new fee structure rewards those members who keep their membership current and members who join early in the year through discounts on membership. The details are as follows:
Additional points:
Should you require any additional information, please contact our Membership group by e-mail at ldc@ldc.upenn.edu or by phone at +1 (215) 573-1275.
15th Anniversary Fidelity CelebrationAs promised in our June 2007 newsletter, the LDC is holding a Fidelity Celebration in honor of our 15th Anniversary. We want to thank our members for their commitment in establishing and supporting our consortium and we have decided that rewarding your loyalty is the best way to do it! Eligibility - Organizations that have been consecutive members of the LDC since at least 2005 (through 2007 inclusive) are eligible for benefits that can be used for corpora purchases, reduced license fees, extra copy fees or membership discounts; it is entirely up to you! Here’s how it works: * Any organization that has been a member for 3-4 consecutive years (2004, 2005 through 2007) is eligible to receive a $250 benefit * Non-Profit organizations that have been members for 5-9 consecutive years (1999 - 2003 through 2007) are eligible to receive a $500 benefit while For-Profit organizations are eligible to receive a $1500 benefit * Non-Profit organizations that have been members for 10-15 consecutive years (1993 – 1998 through 2007) are eligible to receive a $3500 benefit while For-Profit organizations that have been members for 10-15 consecutive years (1993 – 1998 through 2007) are eligible to receive a $7500 benefit Notification and Terms – The primary contacts at each qualifying organization were notified on June 20, 2007 via email. One benefit award will be made for every 20 organizations in each group. Therefore, if there are 23 members who have been consecutive members for 3-4 years, 1 prize will be awarded. If 49 members are eligible, 2 prizes will be awarded, etc. The blind drawing will be held on July 2, 2007, and winners will be immediately notified. The benefits are awarded to the member organization as a whole and must be used during calendar year 2007 on purchases made after notification. All benefits expire on December 31, 2007. Redemption - In order to redeem your benefit, please notify the Membership Coordinator, Ilya Ahtaridis, at the time of your order at ldc@ldc.upenn.edu. All other Fidelity Celebration concerns may be directed to Marian Reed at mreed@ldc.upenn.edu. Thank you for your continued support from all of us at the LDC!
Celebrating 15 Years of Supporting the Language Technology CommunityApril 15, 2007 marks the start of the LDC's 15th Anniversary year! We have many milestones to celebrate including the growth of our staff to include over 40 full-time employees and the an online catalog that includes over 350 linguistic databases. Since 1992, no less than 2,300 organizations from over 80 different nations have licensed LDC data. Numbers aside, it is essential to note how greatly the LDC has evolved while still adhering to our goal to share language-technology resources. Our mission has grown to include linguistic data collection and annotation for an increasing number of areas of language research and engineering, as well as the development of language-related standards and tools. By collecting and creating data that we distribute, the LDC remains responsive to the changing needs of the research community that it has supported for fifteen years. In each of our monthly newsletters, we will highlight one aspect of the LDC - from our work in human subject collections, to our progress in Arabic treebanking, to the technical challenges of collecting and storing high volumes of broadcast news. As we celebrate throughout the year, look for new membership offerings and announcements. And be sure to join us as we count down to the much anticipated distribution of our 50,000th publication.
LDC Membership OptionsLDC Online MembershipThe Linguistic Data Consortium is pleased to announce the LDC Online Membership, which is available for the 2007 Membership year. LDC Online contains a continuously growing, indexed collection of Arabic, Chinese and English newswire text, millions of words of English telephone speech from the Switchboard and Fisher collections and the American English Spoken Lexicon, as well as the full text of the Brown corpus. With LDC Online, users can search textual data and play audio extracts for transcribed utterances on standard web browsers. LDC will continue to add new material to LDC Online.The LDC Online Membership is a reduced cost alternative providing interactive access to a growing subset of LDC data to users who do not have a need for linguistic data on media. Current LDC members already have access to all LDC Online resources. The LDC Online Membership is available to Non-Profit and U.S. government organizations for $1,000 (USD) per calendar year (January to December). Should you require any additional information, please contact our Membership department by e-mail at ldc@ldc.upenn.edu or by phone at (215) 573-1275.
LDC Standard and Subscription MembershipsSince LDC began operations in 1992, we have addressed the growing needs of our members by expanding our mission of sharing and archiving linguistic resources to include data collection and annotation and the development of tools and best practices. We have also carefully refined our distribution processes to allow us to release greater numbers of larger and more varied corpora for a membership fee that has not changed in 12 years. In order to meet the needs of our various members and to continue to provide the quality and quantity of corpora releases, we now offer two membership options for both our Non-Profit and Commercial organizations: the Standard Membership and the new Subscription Membership. The Subscription Membership offers the following benefits:
* Subscription Members automatically receive each and every corpus released in their year of membership. Members do not need to
request individual corpora as they are released. The Subscription Membership fees are $3,500 (USD) for Non-Profit and Government members and $25,000 (USD) for Commercial Members. The additional fees will cover the cost of providing additional services as well as the cost of increasing corpus production. All other membership requirements and regulations remain the same. Please see our Members FAQ page for more information. The Standard Membership fee will remain unchanged: $2,000 (USD) for Non-Profit and Government members and $20,000 (USD) for Commercial members. However, we will impose a quota on the number of corpora an organization may receive from current membership years. For any 2004 and 2005 Standard Memberships executed after October 1, 2004, the quota will be 16 corpora. Any Standard Membership executed before October 1, 2004 will not be subject to the quota. A Standard Member organization that reaches its quota may license additional corpora by paying the individual corpus license fee. Should you require any additional information, please contact our Membership department by e-mail at ldc@ldc.upenn.edu or by phone at (215) 573-1275.
|
||||||||||||||||||||||||||||||||
|
About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data
Contact ldc@ldc.upenn.edu |
|||||||||||||||||||||||||||||||||