Introduction
This corpus represents the secpnd half of a collection of
conversational telephone speech (CTS) that was created at the LDC
during 2003. It contains 5,849 audio files, each one containing a full
conversation of up to ten minutes. Additional information regarding
the speakers involved and types of telephones used can be found in
the companion text corpus of transcripts (Fisher English Training Text
Data, Part 2 -- LDC2005T19).
Data
The first half of the collection (Fisher English Training Speech Data,
Part 1) was released by the LDC in 2004 (LDC2004S13 for speech data,
LDC2004T19 for transcripts). Taken as a whole, the two parts comprise
11,699 recorded telephone conversations.
The individual audio files are presented in NIST SPHERE format, and
contain two-channel mu-law sample data; "shorten" compression has been
applied to all files.
Data collection and transcription were sponsored by DARPA and the
U.S. Department of Defense, as part of the EARS project for research
and development in automatic speech recognition.
Samples
To see an example of this corpus, please examine this sample.
Copyright
Portions © 2005 Trustees of the University of Pennsylvania |