Introduction
This file contains documentation for CSLU:Yes/No Version 1.2, Linguistic Data
Consortium (LDC) catalog number LDC2007S05 and isbn 1-58563-445-X.
CSLU: Yes/No Version 1.2 is a collection of answers to yes/no questions from
various telephone speech corpora created by the Center for Spoken Language Understanding,
Oregon Health and Science University (CSLU). The corpus contains approximately
20,000 examples of roughly 18,000 speakers saying "yes" or "no" in response
to various questions.
Each speech file in the corpus has a corresopnding orthographic transcription
following the CSLU Labeling Conventions. In cases where a transcription did
not already exist, the utterance was run through a speech recognizer to automatically
obtain the transcription.
The data were collected from both analog and digital phone lines. The analog
data were recorded using a Gradient Technologies analog-to-digital conversion
box. These files were recorded as 16-bit, 8 khz and stored in a linear format.
The digital data were recorded with the CSLU T1 digital data collection system.
These files were sampled at 8 khz 8-bit and stored as ulaw files. All of the
data use the RIFF standard file format. This file format is 16-bit linearly
encoded.
Samples
For a sample of the audio in this corpus, please listen to this sample .
Content Copyright
Portions © 1996, 1998, 2000, 2002 Center for Spoken Language Understanding,
Oregon Health and Science University, © 2007 Trustees of the University
of Pennsylvania |