Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



RM Isolated and Spelled Word Data

Item Name: RM Isolated and Spelled Word Data
Authors: .
LDC Catalog No.: LDC96S39
NIST Catalog No.: 2-5.1
ISBN: 1-58563-106-X
Data Type: speech
Sample Rate: 16000 Hz
Sampling Format: 1-channel pcm
Data Source(s): microphone speech
Project(s): RM
Application(s): speech recognition
Language(s): English
Language ID(s): eng
Distribution: 1 CD
Member fee: $0 for 1996 members
Non-member Fee: US $250.00
Reduced-License Fee: US $250.00
Extra-Copy Fee: US $250.00
Non-member License: yes
Readme File: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: .
1996
RM Isolated and Spelled Word Data
Linguistic Data Consortium, Philadelphia

Introduction

This CD-ROM contains previously unreleased isolated-word and spell-mode (spelled out words) speech data from the (D)ARPA Resource Management (RM1) Corpus. This data is based on a 600-word subset of the 991-word RM1 vocabulary and contains spoken and spelled words pertaining to the RM1 naval resource management task. This corpus was collected simultaneously as part of the RM1 Continuous Speech Corpus (NIST Speech Discs 2-1-2-4) and contains speech from the same sets of subjects used in RM1.

Data

The speech data has been segmented into separate spelled and spoken-word waveform files for each subject-word utterance. Time-aligned word and phonetic transcriptions have been generated automatically using forced recognition and are included. The time-aligned transcriptions employ the same format and phone set as the TIMIT Acoustic-Phonetic Continuous Speech Corpus (NIST Speech Disc 1-1). See the TIMIT CD-ROM companion booklet, NISTIR 4930, pp. 29-31, for a description of the phone set.

As with the continuous speech portion of RM1, this data is subsetted into speaker-independent and speaker-dependent partitions. These data sets are further partioned into training, development-test and evaluation-test subsets. See the "readme.doc" file in the top-level directory of the disc for more information about the data.

Texas Instruments recruited the subjects and collected the speech. The National Institute of Standards and Technology (NIST) segmented the waveforms, generated the time-aligned transcriptions and produced this CD-ROM.

Updates

RM Isolated and Spelled Word Data is no longer available as catalog number LDC97S39; it has been incorporated into Resource Management RM1 2.0, and it is currently available in both Resource Management RM1 2.0 (LDC93S3B), and Resource Management Complete Set 2.0 (LDC93S3A).

Copyright


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact: ldc@ldc.upenn.edu

(c) 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.