|LDC94S14A - Complete ATC0 corpus |
LDC94S14B - ATC0 Logan International
LDC94S14C - ATC0 Washington National
LDC94S14D - ATC0 Dallas Fort Worth
The Air Traffic Control Corpus (ATC0) is an eight-disc set of recorded
speech for use in supporting research and development activities in
the area of robust speech recognition in domains similar to air
traffic control (several speakers, noisy channels, relatively small
vocabulary, constrained languaged, etc.) The audio data on these
discs is composed of voice communication traffic between various
controllers and pilots.
The audio files are 8 KHz, 16-bit linear sampled data, representing
continuous monitoring, without squelch or silence elimination, of a
single FAA frequency for one to two hours. There are also files which
indicate the amplitude of the received AM carrier signal at 10 msec.
Full transcripts, including the start and end times of each
transmission, are provided for each audio file. Each flight is
identified by its flight number.
ATC0 consists of three subcorpora, one for each airport in which the
transmissions were collected -- Dallas Fort Worth (DFW), Logan
International (BOS) and Washington National (DCA). The complete set
contains approximately 70 hours of controller and pilot transmissions
collected via antennas and radio receivers which were located in the
vicinity of the respective airports.
Detailed information regarding the collection process and the
equipment used can be found on each disc in the file, "atc.doc" in
the "doc" directory.
The ATC0 Corpus was collected by Texas Instruments under contract to
DARPA. It was produced on CD-ROM by the National Institute of
Standards and Technology for distribution by the Linguistic Data
Relative to the CD-ROMs produced in 1994 by NIST, the sphere files were
renamed with the .sph extension, instead of the .wav extension.
Portions © 1994 Trustees of the University of Pennsylvania