Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



ATIS2

Item Name: ATIS2
Authors: John Garofalo, Jon Fiscus, Kate Hunicke-Smith, Denise Danielson, Elizabeth Shriberg, Enrico Bocchieri, Bruce Buntschuh, Beverly Schwartz, Sandra Peters, Robert Ingria, Robert Weide, Yuzong Chang, Eric Thayer, Lynette Hirschman, Joe Polifroni, Bruce Lund, Goh Kawai, Tom Kuhn, Lew Norton, Deborah Dahl, Madeleine Bates, Michael Brown, Alexander Rudnicky, and David Pallett
LDC Catalog No.: LDC93S5
NIST Catalog No.: 12-1.1 through 12-4.1
ISBN: 1-58563-005-5
Data Type: speech
Sample Rate: 16000 Hz
Sampling Format: 1-channel pcm compressed
Data Source(s): microphone speech
Project(s): ATIS
Application(s): speech recognition, spoken dialogue systems
Language(s): English
Language ID(s): eng
Distribution: 1 DVD
Member fee: $0 for 1993 members
Non-member Fee: US $750.00
Reduced-License Fee: US $375.00
Extra-Copy Fee: US $200.00
Non-member License: yes
Readme File: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: John Garofalo, et al.
1993
ATIS2
Linguistic Data Consortium, Philadelphia

The ATIS2 corpus, on four CD-ROMs, contains approximately 15,000 utterances recorded from approximately 450 subjects at five sites: ATT, BBN, CMU, MIT's Laboratory for Computer Science and SRI. All utterances have been transcribed and almost 10,000 of them annotated with categorizations and canonical reference answers. Unlike the ATIS0 corpus, much of the data in ATIS2 was collected using partially or fully-automated data collection systems. The fully-automated data collection systems were, in fact, working ATIS prototypes.

For ATIS2, the ten-city relational database of ATIS0 was revised to accommodate connecting flights and fares and some table headings were renamed.

In addition to training data, the February and November '92 ATIS Benchmark Tests are included as well. Each contains approximately 1,000 utterances from the pool of data collected by the five sites.

Audio Sample

Update

This publication has been condensed from four CDROM discs to a single DVDROM.

Content Copyright

Portions © 1993 Trustees of the University of Pennsylvania


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact: ldc@ldc.upenn.edu

(c) 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.