Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



ATIS0 SD Read

Item Name: ATIS0 SD Read
Authors: Charles T. Hemphill, John J. Godfrey, George R. Doddington, John Garofolo, Jonathan Fiscus, Nancy Dahlgren, William Fisher, Brett Tjaden, and David Pallett
LDC Catalog No.: LDC93S4B-3
ISBN: 1-58563-004-7
Data Type: speech
Sample Rate: 16000 Hz
Sampling Format: 1-channel pcm
Data Source(s): microphone speech
Project(s): ATIS
Application(s): speech recognition, spoken dialogue systems
Language(s): English
Language ID(s): eng
Distribution: 1 DVD
Member fee: $0 for 1993 members
Non-member Fee: US $500.00
Reduced-License Fee: US $500.00
Extra-Copy Fee: US $500.00
Non-member License: yes
Readme File: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: Charles T. Hemphill, et al.
1993
ATIS0 SD Read
Linguistic Data Consortium, Philadelphia

LDC93S4A - Complete ATIS0 corpus
LDC93S4B - ATIS0 Pilot
LDC93S4B-2 - ATIS0 Read
LDC93S4B-3 - ATIS0 SD-Read

The ATIS0 Corpus totals six CD-ROMs: one with spontaneous data from 36 speakers; one with read versions of the data from 20 of those speakers, along with some adaptation material; and four with extensive speaker dependent material from the ATIS domain, read by ten of the same speakers.

All ATIS speech data is recorded at 16kHz sample rate, 16-bit quantization, from two different microphones, a close-talking (Sennheiser HMD414) and a desk-top (Crown PCC-160) model.

The first disc (ATIS0 Pilot) contains spontaneous utterances elicited in a "Wizard-of-Oz" simulation, along with the relational database containing the travel information (excluding connecting flights). Thirty-six speakers produced a total of 912 utterances.

The second disc (ATIS0 Read) contains "read" versions of the spontaneous utterances for 20 of the 36 speakers above, for a total of 478 productions. This is supplemented by a set of 40 "adaptation" sentences read by each of the 20 speakers.

The third through the sixth discs (ATIS0 SD-Read) contain "read" speech in the ATIS domain for ten of the speakers on the first disc. They read a total of 3,171 utterances, or approximately 317 utterances per speaker. This data was collected for the purpose of training speaker-dependent speech recognition systems for the ATIS0 domain. Two of these four discs contain the close-talking (Sennheiser) microphone data and the other two contain corresponding data for the desk-top (Crown PCC-160) microphone. Thus there are 6,342 waveform files on the four discs.

Update

This publication has been condensed from 4 CDROM discs to a single DVDROM. The contents of each CD reside in separate directories that are organized identically to the original version.

Content Copyright

Portions © 1993 Trustees of the University of Pennsylvania


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact: ldc@ldc.upenn.edu

(c) 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.