Introduction
This CD-ROM contains previously unreleased isolated-word and spell-mode
(spelled out words) speech data from the (D)ARPA Resource Management
(RM1) Corpus. This data is based on a 600-word subset of the 991-word
RM1 vocabulary and contains spoken and spelled words pertaining to the
RM1 naval resource management task. This corpus was collected
simultaneously as part of the RM1 Continuous Speech Corpus (NIST Speech
Discs 2-1-2-4) and contains speech from the same sets of subjects used in
RM1.
Data
The speech data has been segmented into separate spelled and spoken-word
waveform files for each subject-word utterance. Time-aligned word and
phonetic transcriptions have been generated automatically using forced
recognition and are included. The time-aligned transcriptions employ the
same format and phone set as the TIMIT Acoustic-Phonetic Continuous
Speech Corpus (NIST Speech Disc 1-1). See the TIMIT CD-ROM companion
booklet, NISTIR 4930, pp. 29-31, for a description of the phone set.
As with the continuous speech portion of RM1, this data is subsetted into
speaker-independent and speaker-dependent partitions. These data sets
are further partioned into training, development-test and
evaluation-test subsets. See the "readme.doc" file in the top-level
directory of the disc for more information about the data.
Texas Instruments recruited the subjects and collected the speech. The
National Institute of Standards and Technology (NIST) segmented the
waveforms, generated the time-aligned transcriptions and produced this
CD-ROM.
Updates
RM Isolated and Spelled Word Data is no longer available as
catalog number LDC97S39; it has been incorporated into Resource
Management RM1 2.0, and it is currently available in both Resource
Management RM1 2.0 (LDC93S3B), and Resource
Management Complete Set 2.0 (LDC93S3A).
Copyright |