File: fil_spec.doc, updated 07/20/94

* Note that the directory naming conventions are different for the CD-ROM distribution. However, the filenaming conventions have not changed.

MADCOW File and Directory Format Specifications for ATIS3

Directory and Filename Structures

All MADCOW data should be organized into the prescribed directory and filename structures as follows:

    /<CORPUS>/doc/<DOCFILES> 

    where,

          DOCFILES ::= readme.doc | (optional general information file)
                       spkrinfo.log | (mandatory speaker information formatted
                                       according to "atis-spkr-info.log")

    - OR -

    /<CORPUS>/<SPEAKING-MODE>/<SPEAKER>/<SESSION>/<DATA-FILES> 


    where,

          CORPUS ::= atis3
          SPEAKING-MODE ::= spon | vspn | read 
          SPEAKER ::= 001 | ... | zzz (3-character base-36 speaker ID)
          SESSION ::= 1 | ... | z (1-character base-36 scenario session ID
          DATA-FILES ::= <XXX><UU><S><M><P>.<TYPE>

          where,
             
                XXX ::= 001 | ... | zzz (3-character base-36 speaker ID)
                UU ::= 01 | ... | zz (2-char. base-36 within-scenario-session
                                      query ID)
                S ::= 1 | ... | z (1-char. base-36 scenario-session ID)
                M ::= s | r | c (speaking mode:
                                  "s" - spontaneous or 
                                  "r" - read version of spontaneous or 
                                  "c" - read common or
                                  "v" - voice-only spontaneous)
                P ::= s | c | x (microphone:
                                  "s" - Sennheiser, 
                                  "c"- Crown, 
                                  "x" - pertains to all microphones recorded)

         and,

                TYPE ::= log | (session log file - special within-scenario-
                                session query ID of "00" is used in all log 
                                files)
                         wav | (SPHERE-headered speech waveform file)
                         sro | ("speech recognizer output" transcription)
                         lsn | (lexical SNOR transcription derived from .sro)
                         cat | (query categorization)
                         win | (wizard input to NLParse)
                         sql | (SQL query from NLParse to create min (.ref) 
                                answer)
                         sq2 | (SQL query from NLParse to create max (.rf2) 
                                answer)
                         ref | (min reference answer from (.sql) SQL query)
                         rf2 | (max reference answer from (.sq2) SQL query)
                         squ | (subject questionnaire)
                         com | (session comment file - special within-scenario-
                                session query ID of "00" is used in all comment
                                files)

 
Note: Although other ATIS file types do exist, only three of the file types listed above (.log, .wav, .sro) are required as input from sites contributing initial (unannotated) data. Also note that some of the file types above (.cat, .win, .sql, .sq2, .ref, and .rf2) are added by the annotation process. The .lsn files are added at NIST and are used as input to NL-only systems and for scoring SPREC results.

example.

b000e1ss.wav
(speaker b00, query 0e, scenario-session 1, spontaneous speaking mode, Sennheiser mic., waveform file)
Note: The MADCOW ATIS3 corpus will be identified by the database ID (corpus ID) "atis3". This ID should appear in the directory structure and in the waveform file headers.