Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



LLHDB

Item Name: LLHDB
Authors: Douglas Reynolds
LDC Catalog No.: LDC98S68
ISBN: 1-58563-136-1
Data Type: speech
Sample Rate: 8000 Hz
Sampling Format: 1-channel pcm
Data Source(s): telephone speech
Application(s): speaker identification, speech recognition
Language(s): English
Language ID(s): eng
Distribution: 2 CD
Member fee: $0 for 1998 members
Non-member Fee: US$300.00
Reduced-License Fee: US$300.00
Extra-Copy Fee: US$300.00
Non-member License: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: Douglas Reynolds
1998
LLHDB
Linguistic Data Consortium, Philadelphia

Introduction

The LLHDB corpus consists of recordings of people speaking into ten different telephone handsets. The aim was to create a corpus for the study of telephone transducer effects on speech which minimized confounding factors, such as variable telephone channels and background noise. LLHDB was created by having volunteers speak prompted and extemporaneous speech into different transducers in a sound-proof room and directly digitizing the output from the transducers on a SunSparc A/D at a 8kHz sampling rate and a 16-bit resolution.

Data

There were three types of speech recorded for each handset. First, the speaker read the "rainbow passage" [Nolan 83], a 97 word passage sometimes used in phonetic research. Second, the speaker read ten sentences extracted from the TIMIT. Finally, the speaker was asked to describe a photograph for approximately 40 seconds (a different photograph was used for each handset). LLHDB contains speech from 53 speakers (24 males and 29 females) recruited from the laboratory.

Because the same handsets are used in both HTIMIT and LLHDB, it is possible to compare the effects of the two different recording methods.

Updates

Relative to the original CD-ROMs produced in 1998 by the Linguistic Data Consortium, the extension of the audio files was changed from ".wav" to ".sph."

Content Copyright

Portions: 1998 MIT Lincoln Laboratory


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Da ta

Contact: ldc@ldc.upenn.edu

(c) 1992-2008 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.