|

|
|
Road Rally
| |
| Item Name: | Road Rally |
| Authors: | . |
| LDC Catalog No.: | LDC93S11 |
| NIST Catalog No.: | NIST Speech Disc 6-1.1 |
| ISBN: | 1-58563-014-4 |
| Data Type: | speech |
| Sample Rate: | 10000 Hz |
| Sampling Format: | 1-channel pcm |
| Data Source(s): | microphone speech |
| Application(s): | speaker identification |
| Language(s): | English |
| Language ID(s): | eng |
| Distribution: | 1 CD |
| Member fee: | $0 for 1993 members |
| Non-member Fee: | US $750.00 |
| Reduced-License Fee: | US $375.00 |
| Extra-Copy Fee: | US $150.00 |
| Non-member License: | yes |
| Online documentation: | yes |
| Licensing Instructions: | Subscription Members, Standard Members, Non-Members |
| Citation: | . 1993 Road Rally Linguistic Data Consortium, Philadelphia |
|
| The Road Rally corpus was designed for the development and testing
of word-spotting systems and was collected in a conversational domain
using a road rally planning task as the topic. The corpus actually
consists of two sub-corpora: "Stonehenge" and "Waterloo." The
Stonehenge corpus contains road rally planning conversations as well
as some read speech collected using high quality microphones and a
telephone-simulating filter. The Waterloo corpus contains read road
rally planning domain speech which was collected using actual
telephone lines.
Stonehenge
The Stonehenge corpus was collected from subjects using telephone
handsets which were modified to contain a high quality microphone. To
gather conversational data, two talkers were located in separate rooms,
given a road map and asked to participate in a road rally planning
task. Their objective was to form a path between two locations on the
map which would maximize their road rally point score. They were also
given a time limit in which to complete the task to increase their
responsiveness. Their speech was recorded on a stereo tape recorder
with each subject's speech on a separate track. The tracks were
digitized and the speech was edited to remove silences longer than a
second or so. This resulted in approximately three minutes of continuous
speech per subject. The speech was filtered using a 300Hz to 3300Hz
PCM FIR bandpass filter to simulate telephone bandwidth
quality. The Stonehenge corpus consists of 80 speakers; 28 females and
52 males.
Waterloo
The Waterloo corpus was collected as an extension to Stonehenge to
provide similar domain speech under different conditions. The corpus
was collected from subjects using conventional telephones and dialed
up telephone lines in the Massachussetts area. Unlike the Stonehenge
speech, the Waterloo speech is naturally band-limited by the
telephones/lines but for consistency, the speech was also filtered
using the Stonehenge 300Hz to 3300Hz PCM FIR bandpass filter. The
corpus consists of 56 speakers (28 males and 28 females) each reading
aloud a paragraph of road rally domain speech. |
|
|