|

|
|
OGI Multilanguage Corpus
| |
| Item Name: | OGI Multilanguage Corpus |
| Authors: | Ron Cole and Yeshwant Muthusamy |
| LDC Catalog No.: | LDC94S17 |
| ISBN: | 1-58563-035-7 |
| Data Type: | speech |
| Sample Rate: | 8000 Hz |
| Sampling Format: | 1-channel pcm compressed |
| Data Source(s): | telephone speech |
| Application(s): | speech recognition |
| Language(s): | English, Farsi, French, German, Hindi, Japanese, Korean, Mandarin Chinese, Spanish, Tamil, Vietnamese |
| Language ID(s): | DEU, ENG, FRA, HIN, JPN, KOR, TAM, VIE |
| Distribution: | 1 CD |
| Member fee: | $0 for 1994 members |
| Non-member Fee: | US$500.00 |
| Reduced-License Fee: | US$250.00 |
| Extra-Copy Fee: | US$150.00 |
| Non-member License: | yes |
| Online documentation: | yes |
| Licensing Instructions: | Subscription Members, Standard Members, Non-Members |
| Citation: | Ron Cole and Yeshwant Muthusamy 1994 OGI Multilanguage Corpus Linguistic Data Consortium, Philadelphia |
|
| The corpus consists of responses to prompts spoken over commercial
telephone lines by speakers of English, Farsi (Persian), French,
German, Hindi, Japanese, Korean, Mandarin Chinese, Spanish, Tamil and
Vietnamese. It contains a total of 1,927 calls, an average of 175
calls per language.
Speech was collected using an automated system that answered the
telephone, played digitized prompts in the appropriate language to
request the speech samples and digitized the callers' responses for a
designated period of time.
Log files are included that provide a set of automatic measurements
made on each utterance. In addition, some utterances were
automatically segmented into broad phonetic catagories. The speech
data are compressed, with NIST SPHERE headers.
Content Copyright |
|
|