LDC93S4A - Complete ATIS0 corpus
LDC93S4B - ATIS0 Pilot
LDC93S4B-2 - ATIS0 Read
LDC93S4B-3 - ATIS0 SD-Read
The ATIS0 Corpus totals six CD-ROMs: one with spontaneous data from 36
speakers; one with read versions of the data from 20 of those
speakers, along with some adaptation material; and four with extensive
speaker dependent material from the ATIS domain, read by ten of the
same speakers.
All ATIS speech data is recorded at 16kHz sample rate, 16-bit
quantization, from two different microphones, a close-talking
(Sennheiser HMD414) and a desk-top (Crown PCC-160) model.
The first disc (ATIS0 Pilot) contains spontaneous utterances elicited
in a "Wizard-of-Oz" simulation, along with the relational database
containing the travel information (excluding connecting flights).
Thirty-six speakers produced a total of 912 utterances.
The second disc (ATIS0 Read) contains "read" versions of the
spontaneous utterances for 20 of the 36 speakers above, for a total of
478 productions. This is supplemented by a set of 40 "adaptation"
sentences read by each of the 20 speakers.
The third through the sixth discs (ATIS0 SD-Read) contain "read"
speech in the ATIS domain for ten of the speakers on the first disc.
They read a total of 3,171 utterances, or approximately 317 utterances
per speaker. This data was collected for the purpose of training
speaker-dependent speech recognition systems for the ATIS0 domain.
Two of these four discs contain the close-talking (Sennheiser)
microphone data and the other two contain corresponding data for the
desk-top (Crown PCC-160) microphone. Thus there are 6,342 waveform
files on the four discs.
Update
This publication has been condensed from 4 CDROM discs to a single DVDROM. The contents of each CD reside in separate directories that are organized identically to the original version.
Content Copyright
Portions © 1993 Trustees of the University of Pennsylvania |