Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



Switchboard-2 Phase II

Item Name: Switchboard-2 Phase II
Authors: David Graff, Kevin Walker, and Alexandra Canavan
LDC Catalog No.: LDC99S79
ISBN: 1-58563-144-2
Data Type: speech
Sample Rate: 8000 Hz
Sampling Format: 2-channel ulaw
Data Source(s): telephone conversations
Project(s): EARS, GALE, SID
Application(s): speaker identification
Language(s): English
Language ID(s): eng
Distribution: 6 DVD
Member fee: $0 for 1999 members
Non-member Fee: US $7500.00
Reduced-License Fee: US $3750.00
Extra-Copy Fee: US $1200.00
Non-member License: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: David Graff, Kevin Walker, and Alexandra Canavan
1999
Switchboard-2 Phase II
Linguistic Data Consortium, Philadelphia

Introduction

SWB-2 Phase II consists of 4,472 five-minute telephone conversations involving 679 participants. This corpus was collected by the Linguistic Data Consortium (LDC) in support of a project on Speaker Recognition sponsored by the U.S. Department of Defense.

Data

Speakers were solicited by the LDC to participate in this telephone speech collection effort via the Internet, newspaper advertisements, and personal contacts. The majority of participants resided in the following states:

     State	Number of Speakers
     --------------------------
     MN		156 
     WI		105 
     OH		 70 
     IA		 64 
     MI		 41 
     IL		 37 
     

Participants in SWB-2 Phase II were recruited from the following midwestern college campuses: Iowa State University, Michigan State University, University of Michigan, University of Minnesota, University of Wisconsin at Madison, Northwestern University, and Ohio State University.

Each recruit was asked to participate in at least ten five-minute phone calls. Ideally each participant would receive five calls at a designated number and make five calls from phones with different (ANI) codes. Participants were asked to discuss a specific topic (read by the automated operator) and not to provide personal information during their call.

Each of the 679 participants placed their calls via a toll-free robot operator maintained by the LDC. Access to the robot operator was possible via a unique Personal Identification Number (PIN) issued by the recruiting staff at the LDC when the caller enrolled in the project.

Upon conclusion of the study all calls were audited by LDC staff members. Particular attention was paid to PIN verification (matching speaker with PIN), checking call duration, and call quality. Upon completion of this process, checks were issued and mailed to participants. The conversations have not been transcribed.

Updates

09/29/2011: Updated the file table to accurately reflect the files now that they are on DVDs. Also, updated the readme to indicate these changes.

Copyright

Portions © 1999 Trustees of the University of Pennsylvania


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact: ldc@ldc.upenn.edu

(c) 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.