This file contains documentation for STC-TIMIT 1.0, Linguistic Data Consortium
(LDC) catalog number LDC2008S03 and isbn 1-58563-468-9.
STC-TIMIT 1.0 is a telephone version of TIMIT
Acoustic Phonetic Continuous Speech Corpus, LDC93S1 (TIMIT). TIMIT contains
broadband recordings of 630 speakers of eight major dialects of American English
reading ten phonetically rich sentences. Created in 1993, TIMIT was designed
to provide speech data for acoustic-phonetic studies and for the development
and evaluation of automatic speech recognition systems. Since that time, several
corpora have been developed using the TIMIT database: NTIMIT,
LDC93S2 (transmiting TIMIT recordings through a telephone handset and over
various channels in the NYNEX telephone network and redigitizing them);
CTIMIT,
LDC96S30 (passing TIMIT files through cellular telephone circuits);
FFMTIMIT,
LDC96S32 (re-recording TIMIT files with a free-field microphone); and HTIMIT,
LDC98S67 (re-recording a subset of TIMIT files through different telephone
handsets).
What differentiates STC-TIMIT 1.0 from other TIMIT-derived corpora is that
the entire TIMIT database was passed through an actual telephone channel in
a single call. Thus, a single type of channel distortion and noise affect the
whole database.
The process was managed using a Dialogic switchboard for the calling and receiving
ends. No transducer (microphone) was employed; the original digital signal was
converted to analog using the switchboard's A/D converter, transmitted trough
a telephone channel and converted back to digital format before recording. As
a result, the only distortion introduced is that of the telephone channel itself.
The STC-TIMIT 1.0 database is organized in the same manner as in the original
TIMIT corpus: 4620 files belonging to the training partition and 1680 files
belonging to the test partition. Files were recorded using 8kHz sampling frequency
and muLaw encoding. Additionally four sets of two calibration tones were generated.
These were passed through the telephone line approximately at the start of every
1/4th of the whole database (both the source and recorded calibration tones
in each set are provided). Calibration tones are:
- 2 sec. 1kHz tone
- 2 sec. sweep tone from 10 Hz to 4000 Hz.
Utterances in STC-TIMIT 1.0 are time-aligned with those of TIMIT with an average
precision of 0.125 ms (1 sample), by maximizing the cross-correlation between
pairs of files from each corpus. Thus, labels from TIMIT may be used for STC-TIMIT
1.0, and the effects of telephone channels may be studied on a frame-by-frame
basis.
Data
Originally a single wav file was created by concatenation of all files in the
TIMIT database. This file was downsampled to 8kHz and compressed using muLaw
encoding.
Two telephone lines within the same building were connected to a Dialogic(R)
card. One of the lines was used as the calling-end and played the speech file,
while the other line was used as the receiving-end and recorded the new signal.
The whole recording process was conducted in a single call. Incoming speech
was recorded using 8kHz sampling frequency and muLaw encoding.
After recording, the file was pre-cut according to the length of the corresponding
TIMIT database file. Each resulting file was then aligned to its corresponding
file in TIMIT using the xcorr routine in Matlab(R). Based on these results,
the recorded file was sliced again from the original recorded file using the
newly-generated alignments. Thus, each file in STC-TIMIT 1.0 is aligned to its
equivalent in TIMIT and has the same length.
Sample
For an example of the data contained in this corps, please listen to this audio sample.
Content Copyright
Portions © 2007, 2008 Nicolás Morales, © 1993, 2008 Trustees
of the University of Pennsylvania |