Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



HCRC Map Task Corpus

Item Name: HCRC Map Task Corpus
Authors: .
LDC Catalog No.: LDC93S12
ISBN: 1-58563-009-8
Data Type: speech
Sample Rate: 20000 Hz
Sampling Format: 2-channel pcm
Data Source(s): microphone conversation
Application(s): discourse analysis
Language(s): English
Language ID(s): eng
Distribution: 1 CD, 1 DVD
Member fee: $0 for 1993, 1996 members
Non-member Fee: US$500.00
Reduced-License Fee: US$350.00
Extra-Copy Fee: US$350.00
Non-member License: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: .
1993
HCRC Map Task Corpus
Linguistic Data Consortium, Philadelphia

Originally published as set of eight CD-ROMS, the Map Task Corpus is now delivred as a two disc set as 1 DVDROM and 1 CDROM. The contents of each disc reside in seprate directories with the same structure as the original set. The Map Task Corpus contains a total of about 18 hours of spontaneous speech that was recorded from 128 two-person conversations, involving 64 different speakers (32 female, 32 male, all adults, each taking part in four conversations). The 64 speakers were all students at the University of Glasgow, 61 of them being native Scots. The conversations were carried out in an experimental setting, in which each participant has a schematic map in front of them, not visible to the other. Each map is comprised of an outline and roughly a dozen labelled features (e.g. "a white cottage," "an oak forest," "Green Bay," etc). Most features are common to the two maps, but not all. One map has a route drawn in, the other does not. The task is for the participant without the route to draw one on the basis of discussion with the participant with the route. In addition to the conversations, each speaker provides a wordlist reading, consisting of the major vocabulary items contained in the conversations.

The experimental design allows a number of different phonemic, syntactico-semantic and pragmatic contrasts to be explored in a controlled way. In particular, maps and feature names were designed to allow for controlled exploration of phonological reductions of various kinds in a number of different referential contexts and to provide, via varying patterns of matches and mis-matches between the two maps, a range of different stimuli for referent negotiation. Also the conditions of the conversations were carefully balanced: In half of them the talkers were strangers, in half friends; in half of them the talkers could see each other's faces, in half they could not.

The waveform data are provided in "raw" (headerless) files (16-bit samples, 20 kHz sample rate, two channels per conversation) and alternative header files are provided for use with software based on either the NIST "SPHERE" header structure or the European "SAM" header structure. Text transcriptions are provided for each conversation, along with PostScript files of the map images used in the experiments. Additional materials include full documentation of the experimental design and data collection protocol, resources for using SGML tools on the transcriptions and other text materials and an extensive set of source code for performing basic signal processing functions on the waveform data, such as down-sampling, de-multiplexing, channel summation and D/A conversion for Sun workstations (including playback of segments selected via inspection of transcripts in Emacs).


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Da ta

Contact: ldc@ldc.upenn.edu

(c) 1992-2008 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.