Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome


Recent Announcements from LDC

New LDC website coming soon

Look for LDC's new website in the coming weeks. We've revamped the design and site plan to make it easier than ever to find what you're looking for. The features you use the most -- the catalog, new corpus releases and user login -- will be a short click away. We expect the LDC website to be occasionally unavailable for a few days at the end of September as we make the switch and thank you in advance for your understanding.

[ top ]

Mixer 6 now available

The release of Mixer 6 Speech this month marks the first time in close to a decade that LDC has made available a large-scale collection for training speaker recognition. Representing more than 15,000 hours of speech from over 500 speakers, Mixer 6 follows in the footsteps of the Switchboard and Fisher studies by providing a large database of telephone conversation and also adds interview and transcript reading sessions. Participants were native American English speakers local to the Philadelphia area. Mixer 6 Speech is a members-only release and a great reason to join the consortium. In addition to this substantial resource, members enjoy rights to other data released in 2013 and can license older publications at reduced fees. Please see the catalog page for Mixer 6 Speech.

[ top ]

LDC at Interspeech 2013

LDC will once again be exhibiting at Interspeech held this year August 25-29 in Lyon. Please stop by LDC’s booth to to learn about recent developments at the Consortium, including new publications.

Also, be on the lookout for the following presentations:

  • Speech Activity Detection on YouTube Using Deep Neural Networks
  • Neville Ryant, Mark Liberman, Jiahong Yuan (all LDC)
    Monday 26 August, Poster 6, 16.00 – 18.00
    Room: Forum 6

  • The Spectral Dynamics of Vowels in Mandarin Chinese
  • Jiahong Yuan
    Tuesday 27 August, Oral 17, 14.00 – 16.00
    Room: Gratte-Ciel 3

  • Automatic Phonetic Segmentation using Boundary Models
  • Jiahong Yuan (LDC), Neville Ryant (LDC), Mark Liberman (LDC), Andrea Stolcke, Vikramjit Mitre, Wen Wang
    Wednesday 28 August, Oral 32, 14.00 – 16.00
    Room: Gratte-Ciel 3

LDC will continue to post conference updates via our Facebook page. We hope to see you there!

[ top ]

High School students use LDC data

A team of students at Thomas Jefferson High School for Science and Technology in Alexandria, VA, USA, have used an LDC database for the development of a device to help autistic children recognize emotions. This team was funded by a grant from the Lemelson-MIT InvenTeam Initiative Program. InvenTeams are groups of high school students, teachers, and mentors that receive grants up to US$10,000 each to invent technological solutions to real-world problems.

The team set out to invent an emotive aid in the form of a bracelet that uses a computational algorithm to extract emotional signatures from speech and display expressed emotions in real-time during a conversation. Potential beneficiaries include children with autism, Asperger’s syndrome, or similar diseases that impair the ability to detect emotion. The algorithm employed machine learning and neural network-based techniques to improve accuracy and efficiency relative to current methods.

The students used speech samples from the LDC database, Emotional Prosody Speech and Transcripts (LDC2002S28) as well the Berlin Database of Emotional Speech for training and testing their algorithm. Although the samples proved to be too small to produce an algorithm with a high degree of accuracy, the team's algorithm did demonstrate some degree of success. The students will present their results at Eurekafest at MIT in June.

LDC thanks the InvenTeam’s teacher, Mark Hannum, and group leader, Suhas Gondi, for contributing to this article.

[ top ]

LDC at ICASSP 2013

LDC will be at ICASSP 2013, the world's largest and most comprehensive technical conference focused on signal processing and its applications. The event will be held over May 26-31 and we look forward to interacting with members of this community at our exhibit table and during our poster and paper presentations:

Tuesday, May 28, 15:30 - 17:30, Poster Area D
ARTICULATORY TRAJECTORIES FOR LARGE-VOCABULARY SPEECH RECOGNITION
Authors: Vikramjit Mitra, Wen Wang, Andreas Stolcke, Hosung Nam, Colleen Richey, Jiahong Yuan (LDC), Mark Liberman (LDC)

Tuesday, May 28, 16:30 - 16:50, Room 2011
SCALE-SPACE EXPANSION OF ACOUSTIC FEATURES IMPROVES SPEECH EVENT DETECTION
Authors: Neville Ryant, Jiahong Yuan, Mark Liberman (all LDC)

Wednesday, May 29, 15:20 - 17:20, Poster Area D
USING MULTIPLE VERSIONS OF SPEECH INPUT IN PHONE RECOGNITION
Authors: Mark Liberman (LDC), Jiahong Yuan (LDC), Andreas Stolcke, Wen Wang, Vikramjit Mitra

Please look for LDC's exhibition at Booth #53 in the Vancouver Convention Centre. We hope to see you there!

[ top ]

Early renewing members save on fees

To date just over 100 organizations have joined for Membership Year (MY) 2013. For the sixth straight year, LDC's early renewal discount program has resulted in significant savings for our members. Organizations that renewed membership or joined early for MY2013 saved over US$50,000! MY 2012 members are still eligible for a 5% discount when renewing for MY2013. This discount will apply throughout 2013.

Organizations joining LDC can take advantage of membership benefits including free membership year data as well as discounts on older LDC corpora. For-profit members can use most LDC data for commercial applications. Please visit our Members FAQfor further information.

[ top ]

Commercial use and LDC data

Has your company obtained an LDC database as a non-member? For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. In the case of a small group of corpora such as American National Corpus (ANC) Second Release (LDC2005T35), Buckwalter Arabic Morphological Analyzer Version 2.0 (LDC2004L02), CELEX2 (LDC96L14) and all CSLU corpora, commercial licenses must be obtained separately from the owners of the data even if an organization is a for-profit member.

[ top ]

Checking in with LDC Data Scholarship Recipients

The LDC Data Scholarship program provides college and university students with access to LDC data at no-cost. Students are asked to complete an application which consists of a proposal describing their intended use of the data, as well as a letter of support from their thesis adviser. LDC introduced the Data Scholarship program during the Fall 2010 semester. Since that time, more than thirty individual students and student research groups have been awarded no-cost copies of LDC data for their research endeavors. These are student reports on their progress using LDC data:

  • Leili Javadpour - Louisiana State University (USA), Engineering Science. Leili was awarded a copy of BBN Pronoun Coreference and Entity Type Corpus (LDC2005T33) and Message Understanding Conference (MUC) 7 (LDC2001T02) for her work in pronominal anaphora resolution. Leili's research involves a learning approach for pronominal anaphora resolution in unstructured text. She evaluated her approach on the BBN Pronoun Coreference and Entity Type Corpus and obtained encouraging results of 89%. In this approach machine learning is applied to a set of new features selected from other computational linguistic research. Leili's future plans involve evaluating the approach on Message Understanding Conference (MUC) 7 as well as on other genres of annotated text such as stories and conversation transcripts.

  • Olga Nickolaevna Ladoshko - National Technical University of Ukraine "KPI" (Ukraine), graduate student, Acoustics and Acoustoelectronics. Olga was awarded copies of NTIMT (LDC93S2) and STC-TIMIT 1.0 (LDC2008S03) for her research in automatic speech recognition for Ukrainian. Olga used NTIMIT in the first phase of her research; one problem she investigated was the influence of telephone communication channels on the reliability of phoneme recognition in different types of parametrization and configuration speech recognition systems on the basis of HTK tools. The second phase involves using NTIMIT to test the algorithm for determining voice in non-stationary noise. Her future work with STC-TIMIT 1.0 will include an experiment to develop an improved speech recognition algorithm, allowing for increased accuracy under noisy conditions.

  • Genevieve Sapijaszko - University of Central Florida (USA), Phd Candidate, Electrical and Computer Engineering. Genevieve was awarded a copy TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) and YOHO Speaker Verification (LDC94S16) for her work in digital signal processing. Her experiment used VQ and Euclidean distance to recognize a speaker's identity through extracting the features of the speech signal by the following methods: RCC, MFCC, MFCC + ?MFCC, LPC, LPCC, PLPCC and RASTA PLPCC. Based on the results, in a noise free environment MFCC, (at an average of 94%), is the best feature extraction method when used in conjunction with the VQ model. The addition of the ?MFCC showed no significant improvement to the recognition rate. When comparing three phrases of differing length, the longer two phrases had very similar recognition rates but the shorter phrase at 0.5 seconds had a noticeable lower recognition rate across methods. When comparing recognition time, MFCC was also faster than other methods. Genevieve and her research team concluded that MFCC in a noise free environment was the best method in terms of recognition rate and recognition rate time.

  • John Steinberg - Temple University (USA), MS candidate, Electrical and Computer Engineering. John was awarded a copy of CALLHOME Mandarin Chinese Lexicon (LDC96L15) and CALLHOME Mandarin Chinese Transcripts (LDC96T16) for his work in speech recognition. John used the CALLHOME Mandarin Lexicon and Transcripts to investigate the integration of Bayesian nonparametric techniques into speech recognition systems. These techniques are able to detect the underlying structure of the data and theoretically generate better acoustic models than typical parametric approaches such as HMM. His work investigated using one such model, Dirichlet process mixtures, in conjunction with three variational Bayesian inference algorithms for acoustic modeling. The scope of his work was limited to a phoneme classification problem since John's goal was to determine the viability of these algorithms for acoustic modeling.

    One goal of his research group is to develop a speech recognition system that is robust to variations in the acoustic channel. The group is also interested in building acoustic models that generalize well across languages. For these reasons, both CALLHOME English and CALLHOME Mandarin data were used to help determine if these new Bayesian nonparametric models were prone to any language specific artifacts. These two languages, though phonetically very different, did not yield significantly different performances. Furthermore, one variational inference algorithm- accelerated variational Dirichlet process mixtures (AVDPM) - was found to perform well on extremely large data sets.

[ top ]

LDC’s 20th Anniversary: Concluding a Year of Celebration

We’ve enjoyed celebrating our 20th Anniversary this last year (April 2012 - March 2013) and would like to review some highlights before its close.

Our 2012 User Survey, circulated early in 2012, included a special Anniversary section in which respondents were asked to reflect on their opinions of, and dealings with, LDC over the years. We were humbled by the response. Multiple users mentioned that they would not be able to conduct their research without LDC and its data. For a full list of survey testimonials, please click here.

LDC also developed its first-ever timeline (initially published in the April 2012 Newsletter) marking significant milestones in the consortium’s founding and growth.

In September, we hosted a 20th Anniversary Workshop that brought together many friends and collaborators to discuss the present and future of language resources.

Throughout the year, we conducted several interviews of long-time LDC staff members to document their unique recollections of LDC history and to solicit their opinions on the future of the Consortium. These interviews are available as podcasts on the LDC Blog.

As our Anniversary year draws to a close, one task remains – to thank all of LDC’s past, present and future members and other friends of the Consortium for their loyalty and for their contributions to the community. LDC would not exist if not for its supporters. The variety of relationships that LDC has built over the years is a direct reflection of the vitality, strength and diversity of the community. We thank you all and hope that we continue to serve your needs in our third decade and beyond.

For a last treat, please visit LDC’s newly-launched YouTube channel to enjoy this video montage of the LDC staff interviews featured in the podcast series.

Thank you again for your continued support!

[ top ]

Spring 2013 LDC Data Scholarship Recipients

LDC is pleased to announce the student recipients of the Spring 2013 LDC Data Scholarship program! This program provides university students with access to LDC data at no-cost. Students were asked to complete an application which consisted of a proposal describing their intended use of the data, as well as a letter of support from their thesis adviser. We received many solid applications and have chosen three proposals to support. The following students will receive no-cost copies of LDC data:

  • Salima Harrat - Ecole Supérieure d’informatique (ESI) (Algeria). Salima has been awarded a copy of Arabic Treebank: Part 3 for her work in diacritization restoration.

  • Maulik C. Madhavi - Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar (India). Maulik has been awarded a copy of Switchboard Cellular Part 1 Transcribed Audio and Transcripts and 1997 HUB4 English Evaluation Speech and Transcripts for his work in spoken term detection.

  • Shereen M. Oraby - Arab Academy for Science, Technology, and Maritime Transport (Egypt). Shereen has been awarded a copy of Arabic Treebank: Part 1 for her work in subjectivity and sentiment analysis.

Please join us in congratulating our student recipients! The next LDC Data Scholarship program is scheduled for the Fall 2013 semester.

[ top ]

Membership Fee Savings and Publications Pipeline

Time is quickly running out to save on membership fees for Membership Year 2013 (MY2013)! Any organization which joins or renews membership for 2013 through Friday, March 1, 2013, is entitled to a 5% discount on membership fees. Organizations which held membership for MY2012 can receive a 10% discount on fees provided they renew prior to March 1, 2013.

Many publications for MY2013 are still in development. The planned publications for the upcoming months include:

  • GALE data ~ continuing releases of all languages (Arabic, Chinese, English), genres (Broadcast News, Broadcast Conversation, Newswire and Web Data) and tasks (Parallel Text, Word Alignment, Parallel Aligned Treebanks, Parallel Sentences, Audio and Transcripts).

  • Hispanic Accented English Database ~ 30 hours of conversational speech data from non-native speakers of English with approximately 24 hours or 80% of the data closely transcribed. The speech in this release was collected from 22 non-native, Hispanic speakers of English and consists of spontaneous speech and read utterances. The read speech is divided equally between English and Spanish.

  • NIST 2012 Open Machine Translation Progress Tests ~ contains the evaluation sets (source data and human reference translations), DTD, scoring software, and evaluation plan for the OpenMT12 test for Arabic, Chinese, Dari, Farsi, and Korean to English on a parallel data set. This set is based on a subset of the Arabic-to-English and Chinese-to-English Progress tests from the NIST Open Machine Translation 2008, 2009, and 2012 evaluations with new source data created based on the English human reference translation reference. The original data consists of newswire and web data.

  • NIST Open Machine Translation 2008 to 2012 Progress Test Sets ~ contains the evaluation sets (source data and human reference translations), DTD, scoring software, and evaluation plans for the Arabic-to-English and Chinese-to-English Progress tests of the NIST Open Machine Translation 2008, 2009, and 2012 Evaluations. The test sets consist of newswire and web data.

  • OntoNotes 5.0 ~ multiple genres of English, Chinese, and Arabic text annotated for syntax, predicate argument structure and shallow semantics.

  • UN Parallel Text ~ contains the text of United Nations parliamentary documents in Arabic, Chinese, English, French, Russian, and Spanish from 1993 through 2007. The data is provided in two formats: (1) raw text: the raw text is very close to what was extracted from the word processing documents, converted to UTF-8 encoding, and (2) word-aligned text: the word-aligned text has been normalized, tokenized, aligned at the sentence-level, further broken into sub-sentential "chunk-pairs", and then aligned at the word-level.

2013 Subscription Members are automatically sent all MY2013 data as it is released. 2013 Standard Members are entitled to request 16 corpora for free from MY2013. Non-members may license most data for research use. See below for further information on pricing.

[ top ]

Invitation to Join for Membership Year 2013

Membership Year (MY) 2013 is open for joining! We would like to invite all current and previous members of LDC to renew their membership as well as welcome new organizations to join the consortium. For MY2013, LDC is pleased to maintain membership fees at last year’s rates – membership fees will not increase. Additionally, LDC will extend discounts on membership fees to members who keep their membership current and who join early in the year.

The details of our early renewal discounts for MY2013 are as follows:

  • Organizations who joined for MY2012 will receive a 5% discount when renewing. This discount will apply throughout 2013, regardless of time of renewal. MY2012 members renewing before March 1, 2013 will receive an additional 5% discount, for a total 10% discount off the membership fee.

  • New members as well as organizations who did not join for MY2012, but who held membership in any of the previous MYs (1993-2011), will also be eligible for a 5% discount provided that they join/renew before March 1, 2013.

The following table provides exact pricing information

 

  MY2013 Fee MY2013 Fee
with 5% Discount *
MY2013 Fee
with 10% Discount **
Not-for-Profit      
  Standard US$2400 US$2280 US$2160
  Subscription US$3850 US$3657.50 US$3465
For-Profit      
  Standard US$24000 US$22800 US$21600
  Subscription US$27500 US$26125 US$24750

 
  • * For new members, MY2012 Members renewing for MY2013, and any previous year Member who renews before March 1, 2013

  • ** For MY2012 Members renewing before March 1, 2013

Publications for MY2013 are still being planned; here are the working titles of data sets we intend to provide:


Arabic Treebank - Weblog Hispanic-English Speech
Chinese-English Biomedical Parallel Text Maninkakan Lexicon
GALE data – all phases and tasks OpenMT 2008-2012 Progress Set
 

In addition to receiving new publications, current year members of the LDC also enjoy the benefit of licensing older data at reduced costs; current year for-profit members may use most data for commercial applications.

This past year, LDC members who joined early or kept their membership current saved almost US$70,000 collectively on membership fees. Be sure to keep an eye on your mail - all previous and current LDC members will be sent an invitation to join letter and renewal invoice for MY2013. Renew early for MY2013 to save today!

[ top ]

Why become an LDC Member?

LDC is offering early renewal discounts on membership fees for Membership Year 2013 making now a good time to consider joining or renewing membership. LDC membership has the following advantages:

  • LDC membership provides cost-effective access to an extensive and growing catalog that spans 20 years and includes over 500 multilingual speech, text, and video resources. Even if your organization only needs a few datasets from a given membership year, membership is often the most economical way to obtain current corpora. Additionally, the generous discounts that member organizations receive on older corpora reduce the cost of acquiring such datasets.
  • All members enjoy unlimited use of LDC data within their organizations. For universities, there is no difference in cost between a departmental membership and one that is university-wide. Departments can therefore combine resources and establish one LDC membership for use by the entire university community. Likewise, for-profit members with multiple branches can maintain one membership for use by their entire organizations.

For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations, including commercial restrictions, on the use of certain corpora. In the case of a small group of corpora, commercial licenses must be obtained separately from the owners of the data.

[ top ]

2012 User Survey Results

Earlier this year, LDC sent a survey to its user communities. Like previous iterations in 2006 and 2007, the survey solicited community input and suggestions on key LDC-related topics, including:

  • Satisfaction levels with LDC’s data, homepage and Catalog
  • Reflections on LDC’s 20th Anniversary year
  • Suggestions for future publications
  • Speculations on the future of HLT-related fields, specifically on mobile technologies, cloud computing, social networking and open data

Survey respondents were generally satisfied with LDC’s data, membership options, homepage and Catalog, though there were requests for additional data options and data acquisition methods. Some of the data respondents requested are already in our pipeline for the end of 2012 or for Membership Year (MY) 2013, so please be on the lookout for Publications updates. Respondents were also very supportive of LDC’s 20th Anniversary, posting testimonials and well-wishes in the 20th Anniversary section.

LDC would like to thank all survey participants. Survey participants will receive access to full survey results shortly.

[ top ]

Fall 2012 LDC Data Scholarship Recipients

LDC is pleased to announce the student recipients of the Fall 2012 LDC Data Scholarship program! This program provides university and college students with access to LDC data at no-cost. Students were asked to complete an application which consisted of a proposal describing their intended use of the data, as well as a letter of support from their thesis adviser. We received many solid applications and have chosen six proposals to support. The following students will receive no-cost copies of LDC data:

  • Jaffar Atwan - National University of Malaysia (Malaysia), Phd candidate, Information Science and Technology. Jaffar has been awarded a copy of Arabic Newswire Part 1 (LDC2001T55) for his work in information retrieval.

  • Sarath Chandar - Indian Institute of Technology, Madras (India), MS candidate, Computer Science and Engineering. Sarath has been awarded a copy of Treebank-3 (LDC99T42) for his work in grammar induction.

  • Kuruvachan K. George - Amrita Vishwa Vidyapeetham (India), Phd Candidate, Electrical and Computer Engineering. Kuruvachan has been awarded a copy of Fisher English Part 2 (LDC2005S13/T19) and2008 NIST Speaker Recognition Evaluation data (LDC2011S05/07/08/11) for his work in speaker recognition.

  • Eduardo Motta - Pontifícia Universidade Católica do Rio de Janeiro (Brazil), Phd candidate, Information Sciences. Eduardo has been awarded a copy of English Web Treebank (LDC2012T13) for his work in machine learning.

  • Genevieve Sapijaszko - University of Central Florida (USA), Phd Candidate, Electrical and Computer Engineering. Genevieve has been awarded a copy TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) and YOHO Speaker Verification (LDC94S16) for her work in digital signal processing.

  • John Steinberg - Temple University (USA), MS candidate, Electrical and Computer Engineering. John has been awarded a copy of CALLHOME Mandarin Chinese Lexicon (LDC96L15) and CALLHOME Mandarin Chinese Transcripts (LDC96T16) for his work in speech recognition.

[ top ]

LDC Exhibiting at NWAV 41

LDC will be exhibiting at the 41st New Ways of Analyzing Variation Conference (NWAV 41) in late October. This marks the fifth time that LDC has been an NWAV exhibitor and we are proud to show our continued support of the sociolinguistic research community.

The conference runs from October 25-28 and the exhibition hall will be open from October 26-28, 2012. Please stop by to say hello!

[ top ]

LDC 20th Anniversary Workshop Wrap-up

In early September, LDC hosted a workshop entitled The Future of Language Resources in celebration of our 20th anniversary.

Visit the Program page to browse speaker abstracts and to access pdfs of the presentations. Thanks to the speakers and attendees for making the workshop a success!

[ top ]

LDC 20th Anniversary Podcasts

To further celebrate our 20th Anniversary, LDC is conducting interviews of long-time staff members for their unique perspectives on the Consortium’s growth and evolution over the past two decades. The first interview podcast debuts this month and features Dave Graff, LDC’s Lead Programmer. Visit the LDC blog to access the podcast.

Other podcasts will be published via the LDC blog, so stay tuned to that space.

[ top ]

Language Resource Wiki

The Language Resource Wiki catalogs data, software, descriptive grammars and other resources for a variety of languages but especially those with a paucity of generally available resources for research. LDC is actively seeking editors knowledgeable in these and other languages to develop and maintain the pages, which are readable by anyone but writable only by editors. The wiki currently has resource listings for: Bengali, Berber, Breton, Ewe, Greek (Ancient), Indonesian, Hindi, Latin, Panjabi, Pashto, Sorani (Central Kurdish), Russian, Tagalog, Tamil, and Urdu, and for the following Sign Languages: American, British, Catalan, Dutch, Flemish, German, Japanese, New Zealand, Polish, Spanish, and Swiss German.

[ top ]

LDC and Google Collaboration Results in New Syntactically-Annotated Language Resources

Google Inc. and the Linguistic Data Consortium (LDC) have collaborated to develop new syntactically-annotated language resources that enable computers to better understand human language. The project, funded through a gift from Google in 2010, has resulted in the development of the English Web Treebank LDC2012T13 containing over 250,000 words of weblogs, newsgroups, email, reviews and question-answers manually annotated for syntactic structure. This resource will allow language technology researchers to develop and evaluate the robustness of parsing methods in various new web domains. It was used in the 2012 shared task on parsing English web text for the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL) which took place at NAACL-HLT in Montreal on June 8, 2012. The English Web Treebank is available to the research community through LDC’s Catalog.

Natural language processing (NLP) is a field of computational linguistic research concerned with the interactions between human language and computers. Parsing is a discipline within NLP in which computers analyze text and determine its syntactic structure. While syntactic parsing is already practically useful, Google funded this effort to help the research community develop better parsers for web text. The web texts collected and annotated by LDC provide new, diverse data for training parsing systems.

Google chose LDC for this work based on the Consortium’s experience in developing and creating syntactic annotations, also known as treebanks. Treebanks are critically important to parsing research since they provide human-analyzed sentence structures that facilitate training and testing scenarios in NLP research. This work extends the existing relationship between LDC and Google. LDC has published four other Google-developed data sets in the past six years: English, Chinese, Japanese and European language n-grams used principally for language modeling.

[ top ]


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact ldc@ldc.upenn.edu
Last modified: Wednesday, 18-Sep-2013 12:39:43 EDT
© 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.