Introduction
ISL Meeting Transcripts Part 1 was produced by Linguistic Data Consortium (LDC)
catalog number LDC2004T10 and ISBN 1-58563-295-3.
The ISL Meeting Corpus Part 1 is a first subset of the ISL Meeting Corpus (112 meetings).
It contains 18 meetings collected at the Interactive Systems Laboratories at Carnegie Mellon University
in Pittsburgh, PA during the years 2000-2001. The recorded meetings were either natural meetings where
participants needed to meet in the real world, or artificial meetings, which were designed explicitly
for the purposes of data collection but still had real topics and
tasks. The duration of the meetings in this corpus ranges from eight to 64 minutes and averages at 34 minutes.The audio files are
available as
ISL Meeting Speech Part 1.
Data
This corpus consists of 19 word-level transcripts of 18 meetings (one
transcription file per meeting, meeting m039 has two parts, m039a and
m039b), time synchronized to digitized audio recordings. There are
approximately 116,200 word tokens and 5,850 unique word types in the
transcripts.
The meetings were recorded with lapel microphones. The transcriptions were based on the lapel microphones
recordings. The focus of the transcriptions was on capturing the flow of audible events, especially the words
which were spoken, and who spoke them. The transcriptions contain additional annotations for spontaneous speech
events and disfluencies.
Transcriptions were prepared by means of the TransEdit transcription application. This application was developed
for the transcription of multi-channel recordings and displays a
synchronized multi-track view for all channels of a meeting with listening
and segmentation function for each single channel separately.
For an example transcript, please click here.
There are a total of 31 unique speakers in the corpus. Meetings involved anywhere from three to nine
participants, averaging at five. The corpus contains a significant proportion of non-native English
speakers, varying in fluency.
Sponsorship
The collection and preparation of this corpus was made possible in large part through
funding from DARPA, both through the GENOA project and through ROAR.
Updates
Additional information, updates, bug fixes may be avaibale on the ISL Meeting Room project page.
Content Copyright
Portions © 2000-2003 Interactive Systems Laboratories, Carnegie Mellon University, Pittsburgh,
© 2004 Trustees of the University of Pennsylvania |