October 2012 Newsletter

Monday, October 15, 2012

New Corpora

GALE Chinese-English Word Alignment and Tagging Training Part 2 -- Newswire

GALE Phase 2 Arabic Broadcast News Parallel Text

Announcements

Fall 2012 LDC Data Scholarship Recipients
LDC is pleased to announce the student recipients of the Fall 2012 LDC Data Scholarship program!  This program provides university and college students with access to LDC data at no-cost. Students were asked to complete an application which consisted of a proposal describing their intended use of the data, as well as a letter of support from their thesis adviser. We received many solid applications and have chosen six  proposals to support.   The following students will receive no-cost copies of LDC data:

Jaffar Atwan - National University of Malaysia (Malaysia), Phd  candidate, Information Science and Technology.  Jaffar has been awarded a copy of Arabic Newswire Part 1 (LDC2001T55) for his work in information retrieval.

Sarath Chandar - Indian Institute of Technology, Madras (India), MS candidate, Computer Science and Engineering.  Sarath has been awarded a copy of Treebank-3 (LDC99T42) for his work in grammar induction.

Kuruvachan K. George - Amrita Vishwa Vidyapeetham (India), Phd Candidate, Electrical and Computer Engineering.  Kuruvachan has been awarded a copy of Fisher English Part 2 (LDC2005S13/T19) and2008 NIST Speaker Recognition Evaluation data (LDC2011S05/07/08/11) for his work in speaker recognition.

Eduardo Motta - Pontifícia Universidade Católica do Rio de Janeiro (Brazil), Phd candidate, Information Sciences.  Eduardo has been awarded a copy of English Web Treebank (LDC2012T13) for his work in machine learning.

Genevieve Sapijaszko - University of Central Florida (USA), Phd Candidate, Electrical and Computer Engineering.  Genevieve has been awarded a copy TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) and YOHO Speaker Verification (LDC94S16) for her work in digital signal processing.

John Steinberg - Temple University (USA), MS candidate, Electrical and Computer Engineering.  John has been awarded a copy of CALLHOME Mandarin Chinese Lexicon (LDC96L15) and CALLHOME Mandarin Chinese Transcripts (LDC96T16) for his work in speech recognition.

LDC Exhibiting at NWAV 41
LDC will be exhibiting at the 41st New Ways of Analyzing Variation Conference (NWAV 41) in late October. This marks the fifth time that LDC has been an NWAV exhibitor and we are proud to show our continued support of the sociolinguistic research community.

The conference runs from October 25-28 and the exhibition hall will be open from October 26-28, 2012. Please stop by to say hello!

LDC 20th Anniversary Workshop Wrap-up
In early September, LDC hosted a workshop entitled “The Future of Language Resources” in celebration of  our 20th anniversary. Visit the Program page to browse speaker abstracts and to access pdfs of the presentations. Thanks to the speakers and attendees for making the workshop a success!

LDC 20th Anniversary Podcasts
To further celebrate our 20th Anniversary, LDC is conducting  interviews of long-time staff members for their unique perspectives on the Consortium’s growth and evolution over the past two decades. The first interview podcast debuts this month and features Dave Graff, LDC’s Lead Programmer. Visit the LDC blog to access the podcast.

Other podcasts will  be  published via the LDC blog, so stay tuned to that space.

Language Resource Wiki
The Language Resource Wiki catalogs data, software, descriptive grammars and other resources for a variety of languages but especially those with a paucity of generally available resources for research. LDC is actively seeking editors knowledgeable in these and other languages to develop and maintain the pages, which are readable by anyone but writable only by editors. The wiki currently has resource listings for: Bengali, Berber, Breton, Ewe, Greek (Ancient), Indonesian, Hindi, Latin, Panjabi, Pashto, Sorani (Central Kurdish), Russian, Tagalog, Tamil, and Urdu, and for the following Sign Languages: American, British, Catalan, Dutch, Flemish, German, Japanese, New Zealand, Polish, Spanish, and Swiss German.