|

|
|
CSLU: Multilanguage Telephone Speech Version 1.2
| |
| Item Name: | CSLU: Multilanguage Telephone Speech Version 1.2 |
| Authors: | Yeshwant Muthusamy, Ron Cole, and Beatrice Oshika |
| LDC Catalog No.: | LDC2006S35 |
| ISBN: | 1-58563-390-9 |
| Release Date: | Jun 15, 2006 |
| Data Type: | speech |
| Sample Rate: | 8000 Hz |
| Sampling Format: | pcm |
| Data Source(s): | telephone speech |
| Application(s): | language identification, machine translation |
| Language(s): | English, French, German, Hindi, Japanese, Korean, Mandarin Chinese, Spanish, Tamil, Vietnamese, Western Farsi |
| Language ID(s): | cmn, deu, eng, fra, hin, jpn, kor, pes, spa, tam, vie |
| Distribution: | 1 DVD |
| Member fee: | $0 for 2006 members |
| Non-member Fee: | US $150.00 |
| Reduced-License Fee: | US $150.00 |
| Extra-Copy Fee: | US $150.00 |
| Non-member License: | yes |
| Member License: | yes |
| Online documentation: | yes |
| Licensing Instructions: | Subscription Members, Standard Members, Non-Members |
| Citation: | Yeshwant Muthusamy, Ron Cole, and Beatrice Oshika 2006 CSLU: Multilanguage Telephone Speech Version 1.2 Linguistic Data Consortium, Philadelphia |
|
Introduction
The Multilanguage Telephone Speech corpus consists of
telephone speech from 11 languages: English, Farsi,
French, German, Hindi, Japanese, Korean, Mandarin, Spanish,
Tamil, Vietnamese. The corpus contains fixed vocabulary
utterances (eg. days of the week) as well as fluent
continuous speech. The current release includes recorded
utterances from about 2,052 speakers, for a total of about
38.5 hours of speech. Time-aligned phonetic transcriptions
for 619 of the utterances are also included.
Data
Each subject called the CSLU data collection system by
dialing a toll-free number. An analog telephone line was
connected to a Gradient Technologies box. Data from
incoming calls were recorded by the Gradient box. The
sampling rate was 8 khz and the files were stored in 16-bit
linear format on a UNIX file system. Each utterance was
recorded as a separate file.
Samples
For an example of the data in this corpus, please listen to these audio samples in Tamil and English.
Content Copyright
Portions © 1992, 2000, 2002 Center for Spoken Language Understanding, Oregon Health & Science University, © 2006 Trustees of the University of Pennsylvania |
|
|