* Note that the directory naming conventions are different for the CD-ROM distribution. However, the filenaming conventions have not changed.
Directory and Filename Structures
All MADCOW data should be organized into the prescribed directory and filename structures as follows:
/<CORPUS>/doc/<DOCFILES> where, DOCFILES ::= readme.doc | (optional general information file) spkrinfo.log | (mandatory speaker information formatted according to "atis-spkr-info.log") - OR - /<CORPUS>/<SPEAKING-MODE>/<SPEAKER>/<SESSION>/<DATA-FILES> where, CORPUS ::= atis3 SPEAKING-MODE ::= spon | vspn | read SPEAKER ::= 001 | ... | zzz (3-character base-36 speaker ID) SESSION ::= 1 | ... | z (1-character base-36 scenario session ID DATA-FILES ::= <XXX><UU><S><M><P>.<TYPE> where, XXX ::= 001 | ... | zzz (3-character base-36 speaker ID) UU ::= 01 | ... | zz (2-char. base-36 within-scenario-session query ID) S ::= 1 | ... | z (1-char. base-36 scenario-session ID) M ::= s | r | c (speaking mode: "s" - spontaneous or "r" - read version of spontaneous or "c" - read common or "v" - voice-only spontaneous) P ::= s | c | x (microphone: "s" - Sennheiser, "c"- Crown, "x" - pertains to all microphones recorded) and, TYPE ::= log | (session log file - special within-scenario- session query ID of "00" is used in all log files) wav | (SPHERE-headered speech waveform file) sro | ("speech recognizer output" transcription) lsn | (lexical SNOR transcription derived from .sro) cat | (query categorization) win | (wizard input to NLParse) sql | (SQL query from NLParse to create min (.ref) answer) sq2 | (SQL query from NLParse to create max (.rf2) answer) ref | (min reference answer from (.sql) SQL query) rf2 | (max reference answer from (.sq2) SQL query) squ | (subject questionnaire) com | (session comment file - special within-scenario- session query ID of "00" is used in all comment files)Note: Although other ATIS file types do exist, only three of the file types listed above (.log, .wav, .sro) are required as input from sites contributing initial (unannotated) data. Also note that some of the file types above (.cat, .win, .sql, .sq2, .ref, and .rf2) are added by the annotation process. The .lsn files are added at NIST and are used as input to NL-only systems and for scoring SPREC results.
b000e1ss.wavNote: The MADCOW ATIS3 corpus will be identified by the database ID (corpus ID) "atis3". This ID should appear in the directory structure and in the waveform file headers.
(speaker b00, query 0e, scenario-session 1, spontaneous speaking mode, Sennheiser mic., waveform file)