Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



CHAracterizing INdividual Speakers(CHAINS)

Item Name: CHAracterizing INdividual Speakers(CHAINS)
Authors: Fred Cummins, Marco Grimaldi, Thomas Leonard, Juraj Simko
LDC Catalog No.: LDC2008S09
ISBN: 1-58563-497-2
Release Date: Nov 18, 2008
Data Type: Speech
Sampling Format: 16 bit linear PCM
Data Source(s): microphone speech
Application(s): speech recognition
Language(s): English
Language ID(s): ENG
Distribution: 1 DVD
Member fee: $0 for 2008 members
Non-member Fee: US$50.00
Reduced-License Fee: N/A
Extra-Copy Fee: US$50.00
Non-member License: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: Fred Cummins, et al.
2008
CHAracterizing INdividual Speakers(CHAINS)
Linguistic Data Consortium, Philadelphia


Introduction

CHAINS was created by researchers at University College Dublin and contains recordings of thirty-six English speakers reading fables and selected sentences in different speaking styles. The data was obtained in two different sessions with a time separation of about two months. The goal of the corpus is to provide a range of speaking styles and voice modifications for speakers sharing the same accent. Other existing corpora, in particular CSLU Speaker Recognition Version 1.1, TIMIT and the IViE corpus (English Intonation in the British Isles), served as referents in the selection of material. This design decision was made to ensure that methods designed and evaluated on the CHAINS corpus might be directly testable on these other corpora, which were recorded using quite different dialects and channel characteristics.

Additional documentation about the corpus and its methodolgy is available at the CHAINS website.

Data

The data was collected in two recording sessions in a total of six different speaking styles. The first recording session was carried out in a professional recording studio in December 2005. Speakers were recorded in a sound-attenuated booth reading text in the solo, synchronous and retell styles using a Neumann U87 condenser microphone. Additional tracks using other microphones (near and far-field) were also recorded and may be made available upon request to the authors. The second recording session took place from March 2006 to May 2006 in a quiet office environment, using an AKG C420 headset condenser microphone. Speakers read text in the rsi, whisper and fast modes. The six different speaking styles were:

  • solo reading
  • synchronous reading
  • spontaneous speech ("retell")
  • reptitive synchronous imitation ("rsi")
  • whispered fast reading
  • fast speech reading

In two of the speaking conditions adopted, speakers modified their speech in a constrained fashion towards a known target; in the synchronous condition, the speech of the co-speaker served as a target, while in rsi, there was an explicit known static target. The presence of a known target which speakers aim to copy raises the bar in the discovery and design of procedures for automatic speaker identication, as the target speech provides a potentially highly confusing foil. The whisper and fast speech conditions are also well defined speaking styles which require substantial voice modification by the speaker.

Participants were recruited through the University College Dublin and were paid for their participation. No participant had any known speech or hearing deficit. The speakers were from the United Kingdom, the eastern part of Ireland (Dublin and adjacent counties) and the United States. Further information about the speakers, their gender and dialect is available in the documentation released with this corpus.

Samples

For the example of the data in this particular corpus please examine this sound file of the fast reading type

Content Copyright

Portions © 2005, 2006 University College Dublin, © 2008 Trustees of the University of Pennsylvania


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Da ta

Contact: ldc@ldc.upenn.edu

(c) 1992-2008 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.