Multilanguage Telephone Corpus Release Version 1.1 Center for Spoken Language Understanding UPDATED: 29 MArch 2002 This document describes the file naming conventions used for this distribution and gives a brief description of the various file formats used. File Naming Convention ---------------------- File naming follows the following convention: KOcall-70-G.nlang.wav The first field ("KOcall") is the prefix indicating the Language in which this utterance was made, the second field ("70") represents a unique ID number for the speaker, and the third field ("nlang") indicates the type of utterance. Note that the "G" has no bearing on the contents of the file. The .wav and .ptlola files are divided into sub-directories based on the speaker's language, and further divided into sub-directories based on the speaker's unique ID number. These directory names are based on ID number divided by ten. So, utterances from speakers 0-9 are in the subdirectory /0, while utterances from speakers 10-19 are in the subdirectory /1, and so on. The /labels directory has a file structures exactly parallel to the structure of the /speech directory. File Formats ------------ The .wav file format used in this corpus is the RIFF standard file format. This file format is 16-bit linearly encoded. The .ptlola file is a file containing time aligned phonetic transcriptions. This file can be viewed with any standard text editor.