This publication contains the evaluation test material used in the 1998
DARPA/NIST Continuous Speech Recognition Broadcast News HUB4 English
Benchmark Test administered by the NIST
Spoken Natural Language Processing Group and produced by the Linguistic
Data Consortium (LDC), catalog number LDC2000S86, ISBN 1-58563-172-8.
The test material is contained in two SPHERE-formatted waveform files. The
file h4e_98_1.sph (set1) contains 1.5 hours of Broadcast News excerpts from
1996. The file h4e_98_2.sph (set2) contains 1.5 hours of Broadcast News
excerpts from 1998. Each file should be separately recognized per the
English Evaluation Specification.
In addition, the transcripts from the evaluation and
the utf.dtd used to validate the transcripts is now included. For more
complete information, see the 1998
Note: This publication does not contain the Human Reference and Baseline
Recognizer transcripts for the Information Extraction - Named Entity (IE-NE)
Spoke. This material was released separately prior to the start of the
Note: This publication does not contain the material for the HUB4
Non-English evaluation. It will be released separately.
There are no updates at this time.
Portions Copyright 1996 by PRI-Public Radio International
Portions Copyright 1996 by ABC News
Portions Copyright 1996 Cable News Network, Inc. All Rights
Restricted Rights Legend: Information from the USC program 'Marketplace'
contained herein is the property of USC Radio and the University of Southern
California and is protected by copyright. Use, duplication or disclosure by you
is subject to the restrictions set forth in the user agreement and attached to
the computer readable media provided to you by the Linguistic Data Consortium
of the University of Pennsylvania. Copyright 1996 University of Southern
California. all Rights Reserved. Marketplace is produced by USC Radio at the
University of Southern Califnoria, and is distributed to public Radio stations
nationwide by PRI-Public Radio International. Marketplace is made possible by
GE, the Corporation for Public Radio, and Public Radio Stations nationwide.
Note that the waveform and transcript data on this disc are licensed
through the Linguistic Data Consortium
(LDC) and are subject to usage restrictions. Contact the LDC for license agreement
The Reduced Licensing Fee for this corpus is US$150.