Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



ATIS3 Test Data

Item Name: ATIS3 Test Data
Authors: Deborah A. Dahl, Madeleine Bates, Michael Brown, William Fisher, Kate Hunicke-Smith, David Pallett, Christing Pao, Alexander Rudnicky, Elizabeth Shriberg, John Garofolo, Jonathan Fiscus, Denise Danielson, Enrico Bocchieri, Bruce Buntschuh, Beverly Schwartz, Sandra Peters, Robert Ingria, Robert Weide, Yuzong Chang, Eric Thayer, Lynette Hirschman, Joe Polifroni, Bruce Lund, Goh Kawai, Tom Kuhn, and Lew Norton
LDC Catalog No.: LDC95S26
NIST Catalog No.: 17-4.2 through 17-5.1
ISBN: 1-58563-043-8
Data Type: speech
Sample Rate: 16000 Hz
Sampling Format: 1-channel pcm compressed
Data Source(s): microphone speech
Project(s): ATIS
Application(s): speech recognition, spoken dialogue systems
Language(s): English
Language ID(s): ENG
Distribution: 1 DVD
Member fee: $0 for 1995 members
Non-member Fee: US$1500.00
Reduced-License Fee: US$750.00
Extra-Copy Fee: US$200.00
Non-member License: yes
Readme File: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: Deborah A. Dahl, et al.
1995
ATIS3 Test Data
Linguistic Data Consortium, Philadelphia

This set of discs contains a corpus of speech and natural language data collected under the auspices of the Advanced Research Projects Agency Spoken Language Systems (ARPA-SLS) technology development program. The corpus, which contains data in the Air Travel Information Services (ATIS) domain, was designed by the ARPA-SLS Multi-site Atis Data COllection Working (MADCOW) group and was collected by five sites at locations across the U.S.:

  • BBN Systems & Technologies, Cambridge, MA
  • Carnegie Mellon University, Pittsburgh, PA
  • MIT Laboratory for Computer Science, Boston, MA
  • National Institute of Standards and Technology, Gaithersburg, MD
  • SRI International, Menlo Park, CA
The corpora on this set of discs is part of the third phase of collection of ATIS data (ATIS3) and comprises the development test (NIST Speech Disc 17-4.2) and evaluation test material (NIST Speech Disc 17-5.1) used in the December 1994 ARPA SLS Benchmark Tests. As in the previous ATIS corpora, the speech contained in this corpus was elicited by presenting subjects with various hypothetical travel planning scenarios to solve. The resulting spontaneous spoken queries were recorded as the subjects interacted with partially or completely automated ATIS systems to solve the scenarios. Note that the ATIS3 training data is available on NIST Speech Discs 17-1.1 - 17-3.1.

The recorded speech has been transcribed and annotated with categorizations and canonical reference answers. All of the utterances on these discs have been recorded using a close-talking, noise-canceling head-mounted Sennheiser microphone. For some subjects, secondary (noisier) microphone data was recorded simultaneously as well.

These discs also contains the ATIS3 46 city/52 airport relational database, a revised Principles of Interpretation and test implementation and scoring instructions as well as other general documentation.

The ATIS3 corpus has been verified, collated, documented and produced on CD-ROM by the National Institute of Standards and Technology (NIST) in cooperation with MADCOW and distributed by the Linguistic Data Consortium (LDC).

Content Copyright


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Da ta

Contact: ldc@ldc.upenn.edu

(c) 1992-2008 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.