Introduction
This page contains information on The Fisher English Corpus Part 1 Speech, LDC catatalog ID LDC2004S13, ISBN 1-58563-313-5.
This corpus represents the first half of a collection of
conversational telephone speech (CTS) that was created at the LDC
during 2003. It contains 5,850 audio files, each one containing a full
conversation of up to 10 minutes. Additional information regarding
the speakers involved and types of telephones used can be found in
the companion text corpus of transcripts (Fisher English Training Text
Data, Part 1 -- LDC2004T19)
Data
The individual audio files are presented in NIST SPHERE format, and
contain two-channel mu-law sample data; "shorten" compression has been
applied to all files.
Data collection and transcription were sponsored by DARPA and the
U.S. Department of Defense, as part of the EARS project for research
and development in automatic speech recognition.
Samples
Please examine this sample to see an example of the data in this corpus. |