Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



YOHO Speaker Verification

Item Name: YOHO Speaker Verification
Authors: Joseph Campbell and Alan Higgins
LDC Catalog No.: LDC94S16
ISBN: 1-58563-042-X
Data Type: speech
Sample Rate: 8000 Hz
Sampling Format: 1-channel pcm compressed
Data Source(s): microphone speech
Application(s): speaker verification
Language(s): English
Language ID(s): eng
Distribution: 1 CD
Member fee: $0 for 1994, 1998 members
Non-member Fee: US $1000.00
Reduced-License Fee: US $500.00
Extra-Copy Fee: US $150.00
Non-member License: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: Joseph Campbell and Alan Higgins
1994
YOHO Speaker Verification
Linguistic Data Consortium, Philadelphia

The YOHO database contains a large scale, high-quality speech corpus to support text-dependent speaker authentication research, such as is used in "secure access" technology. The data was collected in 1989 by ITT under a US Government contract, but has not been available for public use before. Note that certain changes have been made to the corpus, mainly to insure the privacy of the speakers and some data has been withheld by the government for future use in testing.

YOHO contains:

  • "Combination lock" phrases (e.g. 36-24-36)
  • Collected over three-month period in a real-world office environment
  • Four enrollment sessions per subject with 24 phrases per session
  • Ten test sessions per subject with four phrases per session
  • 8kHz sampling with 3.8 kHz analog bandwidth
  • 1.5 gigabytes of data
The number of trials is thus sufficient to permit evaluation testing at high confidence levels. In each session, a speaker was prompted with a series of phrases to be read aloud; each phrase was a sequence of three two-digit numbers (e.g. "35 - 72 - 41", pronounced "thirty-five seventy-two forty-one"). The first four sessions for a given speaker were enrollment sessions of 24 phrases and all additional sessions were verification trials of four phrases each. In all there are 552 enrollment sessions and 1,380 trial sessions, with a nominal time interval of three days between sessions.

Updates

An update is available that corrects a bug in the original release.

Content Copyright


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact: ldc@ldc.upenn.edu

(c) 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.