Introduction
This file contains documentation for Voice of America (VOA) Czech Broadcast News Audio, Linguistic Data Consortium (LDC) catalog number LDC2000S89 and ISBN
1-58563-179-5. Included below as reference material is the
documentation for Voice of America (VOA) Czech Broadcast News Transcripts, LDC2000T53 and ISBN 1-58563-180-9.
Data
Between February 9 and May 28, 1999, LDC
collected approximately 30 hours of Czech broadcast audio from the Voice of America news service. The 62 data files presented in this corpus represent
the audio of the daily broadcasts of 30-minute news programs.
Due to technical limitations in the hardware at LDC that was used to receive
the VOA broadcasts via a satellite downlink, a number of files contain brief
portions where the audio signal was interrupted. These interruptions typically
yielded regions of complete silence that lasted less than two seconds and were
scattered sparsely throughout an affected audio file. Additional markup was
provided in the transcription texts to isolate the regions where these
interruptions occurred.
The 62 audio files in this corpus are single-channel, 16 KHz, 16-bit linear SPHERE files.
Samples
For an example of the data in this corpus, please review this audio sample.
Updates
There are no updates at this time.
Copyright
Portions © 2000 Trustees of the University of Pennsylvania |