Introduction
NIST Meeting Pilot Corpus Speech was produced by Linguistic Data
Consortium (LDC) catalog number LDC2004S09 and ISBN 1-58563-302-x.
The audio data included in this corpus was collected in the NIST Meeting Data Collection
Laboratory for the NIST Automatic Meeting Recognition Project. The corresponding transcripts are available as the NIST Meeting Pilot Corpus Transcripts and Metadata, while the video files will be published later as NIST Meeting Pilot Corpus Video.
For more information regarding the data collection
conditions, meeting scenarios, transcripts, speaker information,
recording logs, errata, and other ancillary data for the corpus, please consult
the
NIST project website for this corpus.
Data
The data in this corpus consists of 369 SPHERE audio
files generated from 19 meetings (comprising about 15 hours of meeting room data and amounting to about 32 GB),
recorded between November 2001 and December 2003.
Each meeting was recorded using two wireless "personal" mics attached
to each meeting participant: a close-talking noise-cancelling boom mic
and an omni-directional lapel mic). Each meeting was also recorded
using three omni-directional table mics and a four-channel directional table
mic covering 365 degrees (each channel is recorded in a separate
file). Each individual channel was converted from its 48Khz, 24-bits, linear
PCM source format to 16 Khz, 16-bits, linear PCM-sampled audio
SPHERE-formatted files.
Updates
There are no updates available at this time.
Content Copyright
Portions © 2004 Trustees of the University of Pennsylvania |