The USC Marketplace Broadcast News Corpus contains approximately 40
hours of audio data, which was recorded daily between May 1, 1996 and
September 18, 1996. Corresponding transcript files were created by
Federal Document Clearing House and enhanced by the LDC to include:
story boundaries, disfluency markers, and speaker and gender
identification. In keeping with HUB4 style transcription
conventions, LDC spelled all digit strings in standard orthography.
Commercial and music segments, while a part of the audio publication,
were excluded from the transcripts. The timestamps mark the beginning
of each speaker turn relative to the beginning of the recording and
are precise to the 100th of a second. Although the transcripts were
created using HUB4 conventions, the second and third pass quality
checks, typically required by government sponsored evaluation
projects, were skipped.
The USC Marketplace recordings from the summer of 1996 were received
on digital audio tapes (DATs) from the University of Southern
California. LDC excluded from this set the roughly seven hours of
broadcast that are currently included in the 1996 English Broadcast
Marketplace is produced by USC Radio in Los Angeles, a division of the
University of Southern California.
There are no updates at this time.