Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



Road Rally

Item Name: Road Rally
Authors: .
LDC Catalog No.: LDC93S11
NIST Catalog No.: NIST Speech Disc 6-1.1
ISBN: 1-58563-014-4
Data Type: speech
Sample Rate: 10000 Hz
Sampling Format: 1-channel pcm
Data Source(s): microphone speech
Application(s): speaker identification
Language(s): English
Language ID(s): eng
Distribution: 1 CD
Member fee: $0 for 1993 members
Non-member Fee: US $750.00
Reduced-License Fee: US $375.00
Extra-Copy Fee: US $150.00
Non-member License: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: .
1993
Road Rally
Linguistic Data Consortium, Philadelphia

The Road Rally corpus was designed for the development and testing of word-spotting systems and was collected in a conversational domain using a road rally planning task as the topic. The corpus actually consists of two sub-corpora: "Stonehenge" and "Waterloo." The Stonehenge corpus contains road rally planning conversations as well as some read speech collected using high quality microphones and a telephone-simulating filter. The Waterloo corpus contains read road rally planning domain speech which was collected using actual telephone lines.

  • Stonehenge

  • The Stonehenge corpus was collected from subjects using telephone handsets which were modified to contain a high quality microphone. To gather conversational data, two talkers were located in separate rooms, given a road map and asked to participate in a road rally planning task. Their objective was to form a path between two locations on the map which would maximize their road rally point score. They were also given a time limit in which to complete the task to increase their responsiveness. Their speech was recorded on a stereo tape recorder with each subject's speech on a separate track. The tracks were digitized and the speech was edited to remove silences longer than a second or so. This resulted in approximately three minutes of continuous speech per subject. The speech was filtered using a 300Hz to 3300Hz PCM FIR bandpass filter to simulate telephone bandwidth quality. The Stonehenge corpus consists of 80 speakers; 28 females and 52 males.

  • Waterloo

  • The Waterloo corpus was collected as an extension to Stonehenge to provide similar domain speech under different conditions. The corpus was collected from subjects using conventional telephones and dialed up telephone lines in the Massachussetts area. Unlike the Stonehenge speech, the Waterloo speech is naturally band-limited by the telephones/lines but for consistency, the speech was also filtered using the Stonehenge 300Hz to 3300Hz PCM FIR bandpass filter. The corpus consists of 56 speakers (28 males and 28 females) each reading aloud a paragraph of road rally domain speech.


    About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

    Contact: ldc@ldc.upenn.edu

    (c) 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.