Introduction
CHAINS was created by researchers at University College Dublin and contains recordings of thirty-six English speakers reading fables and
selected sentences in different speaking styles. The data was obtained in two
different sessions with a time separation of about two months. The goal of the
corpus is to provide a range of speaking styles and voice modifications for
speakers sharing the same accent. Other existing corpora, in particular CSLU
Speaker Recognition Version 1.1, TIMIT
and the IViE corpus (English Intonation
in the British Isles), served as referents in the selection of material. This
design decision was made to ensure that methods designed and evaluated on the
CHAINS corpus might be directly testable on these other corpora, which were
recorded using quite different dialects and channel characteristics.
Additional documentation about the corpus and its methodolgy is available at
the CHAINS website.
Data
The data was collected in two recording sessions in a total of six different
speaking styles. The first recording session was carried out in a professional
recording studio in December 2005. Speakers were recorded in a sound-attenuated
booth reading text in the solo, synchronous and retell styles using a Neumann
U87 condenser microphone. Additional tracks using other microphones (near and
far-field) were also recorded and may be made available upon request to the
authors. The second recording session took place from March 2006 to May 2006
in a quiet office environment, using an AKG C420 headset condenser microphone.
Speakers read text in the rsi, whisper and fast modes. The six different speaking
styles were:
- solo reading
- synchronous reading
- spontaneous speech ("retell")
- reptitive synchronous imitation ("rsi")
- whispered fast reading
- fast speech reading
In two of the speaking conditions adopted, speakers modified their speech
in a constrained fashion towards a known target; in the synchronous condition,
the speech of the co-speaker served as a target, while in rsi, there was an
explicit known static target. The presence of a known target which speakers
aim to copy raises the bar in the discovery and design of procedures for automatic
speaker identification, as the target speech provides a potentially highly confusing
foil. The whisper and fast speech conditions are also well defined speaking
styles which require substantial voice modification by the speaker.
Participants were recruited through the University College Dublin and were
paid for their participation. No participant had any known speech or hearing
deficit. The speakers were from the United Kingdom, the eastern part of Ireland
(Dublin and adjacent counties) and the United States. Further information about
the speakers, their gender and dialect is available in the documentation released with this corpus.
Samples
For the example of the data in this particular corpus please examine this sound file of the fast reading type
Content Copyright
Portions © 2005, 2006 University College Dublin, © 2008 Trustees of the University of Pennsylvania |