RATS Speech Activity Detection is a unique data set developed by LDC for the DARPA RATS program to address the problem of speech detection on severely degraded audio signals typical of radio communications channels. LDC assembled a specialized system for the transmission, reception and digital capture of audio data that allowed a single source audio signal to be distributed and recorded over eight distinct transceiver configurations simultaneously. The audio data used in the system was conversational telephone speech in five languages – Levantine Arabic, English, Farsi, Pashto, and Urdu – gathered from previous LDC corpora or collected specifically for the task.
The resulting data set of approximately 3,000 hours of audio was then annotated for speech segments in a three-step process. First, LDC utilized its in-house automatic speech activity detector to produce a speech segmentation for each file. Second, manual first pass annotation was performed as a quick correction on the automatic output. Third, in a manual second pass annotation step, annotators reviewed first pass output and made adjustments to segments as needed.
The RATS Speech Activity Detection corpus is now available in the Catalog. Special thanks to Dan Ellis at Columbia University and John Hansen at the University of Texas, Dallas for their substantial technical assistance during the creation of this resource.
Congratulations to the recipients of LDC's Spring 2015 data scholarships:
Christopher Kotfila ~ State University of New York, Albany (USA), PhD Candidate, Informatics. Christopher has been awarded copies of Message Understanding Conference and ACE 2005 SpatialML for his work in named entity extraction.
Ilia Markov ~ National Polytechnic University (Mexico), PhD candidate, Computer Science. Ilia has been awarded a copy of the ETS Corpus of Non-Native Written English for his work in native language identification
Matthew Nelson ~ Georgia State University (USA), MA candidate, Applied Linguistics. Matthew has been awarded a copy of TIMIT and Nationwide Speech for his work in speaker perception.
Meladianos Polykarpos ~ Athens University of Economics and Business (Greece), PhD candidate, Informatics. Meladianos has been awarded a copy of TDT5 Text and Topics/Annotations for his work in information retrieval.
Benjamin Schloss ~ Pennsylvania State University (USA), PhD candidate, Psychology. Benjamin has been awarded a copy of the ETS Corpus of Non-Native Written English for his work in semantics.
For program information visit the Data Scholarship page.
We've revamped our user services to make it easier than ever to access LDC data. Now you can become an LDC member, request corpora, sign agreements and submit payment online directly from your LDC user account.
You’ll receive email notifications of key points in the transaction, when for instance, an order is created, agreements are signed, payment is received and data is shipped. You can also track the status of a transaction from your user account.
Visit the new Managing Your LDC Account page to learn more about user accounts and their privileges and the steps for online transactions.
As always, thanks to our members, sponsors, collaborators and licensees for your continued support.
Podcasts from the complete set of staff interviews conducted as part of LDC's 20th Anniversary can be accessed from the LDC Blog. Hear what long-time staffers had to say about their experiences at LDC.
Christopher Cieri, Executive Director -- Chris reflects on the road that brought him to LDC, some of his early responsibilities and Consortium activities.
David Graff, Lead Programmer -- Dave was one of LDC's first staff members and offers some insights on LDC's early days.
Yiwola Awoyale, Moussa Bamba, Researchers -- Yiwola and Moussa discuss how they came to LDC, their work on West African langauges and how it benefits multiple communities.
Natalia Bragilveskaya, Business Manager; Ilya Ahtaridis, Membership Coordinator; Marian Reed, Marketing Coordinator -- Natalia, Ilya and Marian recall the early days of LDC and the development of its interactions with the University of Pennsylvania, sponsors, members, licensees and collaborators.