Introduction
The LLHDB corpus consists of recordings of people speaking into
ten different telephone handsets. The aim was to create a corpus
for the study of telephone transducer effects on speech which
minimized confounding factors, such as variable telephone channels
and background noise. LLHDB was created by having volunteers speak
prompted and extemporaneous speech into different transducers in a
sound-proof room and directly digitizing the output from the
transducers on a SunSparc A/D at a 8kHz sampling rate and a 16-bit resolution.
Data
There were three types of speech recorded for each
handset. First, the speaker read the "rainbow passage" [Nolan 83],
a 97 word passage sometimes used in phonetic
research. Second, the speaker read ten sentences extracted from the
TIMIT. Finally, the speaker was asked to describe a photograph for
approximately 40 seconds (a different photograph was used for each
handset). LLHDB contains speech from 53 speakers (24 males and 29
females) recruited from the laboratory.
Because the same handsets are used in both HTIMIT and LLHDB, it
is possible to compare the effects of the two different recording
methods.
Updates
Relative to the original CD-ROMs produced in 1998 by the Linguistic Data
Consortium, the extension of the audio files was changed from
".wav" to ".sph."
Content Copyright
Portions: 1998 MIT Lincoln Laboratory |