Introduction
Czech Broadcast Conversation Speech was prepared by researchers at the
University of West Bohemia, Pilsen, Czech Republic, and consists of 40 hours
of speech recorded from Czech Radio 1 in 2003. Transcripts corresponding to
the audio files in this corpus are provided in Czech
Broadcast Conversation MDE Transcripts (LDC2009T20). These corpora join
LDC's other Czech broadcast data sets: Czech
Broadcast News Speech (LDC2004S01), Czech
Broadcast News Transcripts (LDC2004T01), Voice
of America (VOA) Czech Broadcast News Audio (LDC2000S89), and Voice
of America (VOA) Czech Broadcast News Transcripts (LDC2000T53).
Czech Broadcast Conversation Speech consists of 72 single channel recordings
of Radioforum, a live talk program broadcast by Czech
Radio 1 (CRo1) every weekday evening. Its format consists of invited guests
(most often politicians) spontaneously answering topical questions posed by
one or two interviewers. The number of interviewees in a single program varies
from one to three, but typically, one interviewer and two interviewees appear
in the program. The material includes passages of interactive dialogue, but
longer stretches of monologue-like speech comprise the majority of the collected
data. Radioforum also has an interactive segment where listeners call the studio
and ask their own questions. That telephony speech was not transcribed in the
current release.
Data
Individual recordings range from 27 minutes to 36 minutes each. The recordings
were collected during the period from February 12, 2003 through June 26, 2003.
The signal is mono, sampled at 22.05 kHZ with 16-bit resolution, stored in Windows
PCM waveform format. The names of the audio files refer to the broadcast date
(rfYYMMDD.wav).
The table below contains details about the audio files and the transcripts:
| Number of shows |
72 |
| Number of word tokens |
292.6k |
| Number of unique words |
30.5k |
| Duration of transcribed speech |
33.0h |
| Total number of speakers |
128 |
| Male speakers |
108 |
| Female speakers |
20 |
Samples
Sponsorship
The completion of this corpus was facilitated by funding provided by the Ministry of Education of the Czech Republic under projects No. ME909 and 2C06020.
Content Copyright
Portions © 2003 Cesky rozhlas 1 Radiozurnal, © 2009 Trustees of the
University of Pennsylvania |