Multilanguage Telephone Corpus Release Version 1.2 Center for Spoken Language Understanding UPDATED: 3 June 2002 Distribution Directory Structure -------------------------------- This is the directory structure for this distribution of the Multilanguage Telephone Corpus. This corpus is distributed by the Center for Spoken Language Understanding of the Oregon Graduate Institute. Following is a description of the directory structure in this release: readme.txt This file. docs/ The documentation directory. This directory contains further documentation for the Multilanguage anguage corpus. labels/ The labels directory. This directory contains the .ptlola phonetic labels for 619 of the speech files. speech/ The speech directory contains the actual .wav files. There are many further subdirectories within the speech directory. misc/ Miscellaneous directory, possibly containing software tools and scripts. trans/ Phonetic labeling directory. This directory is empty for this corpus. This corpus requires approximately 2.2GB of disk space. Visually, the directory structure looks something like this: multilang | -------------------------------------------------- | | | | | | readme.txt /docs /labels /misc /speech /trans The /speech directory contains the speech data. The files are divided into sub-directories based on the language of the speaker, and then into further subdirectories based on the speaker's ID number. The directory names are based on the ID number divided by 10. So, utterances from speakers 0-9 are in the subdirectory /0, while utterances from speakers 10-19 are in the subdirectory /1, and so on. The /lables directory contains time-aligned phonetic transcriptions 643 of the speech files. As with the speech files, the label files are divided into sub-directories based on the speaker's language and ID number.