|This set of CD-ROMs contains all of the speech data provided to sites
participating in the DARPA CSR November 1995 HUB3 Multi-Microphone
tests. The data consists of digitized waveforms collected with eight
different microphones simultaneously from 40 subjects reading 15
sentence articles drawn from various North American business news
publications. The data is partitioned into development-test and
evaluation-test sets. The test sets were collected with different
subjects, prompts and microphones. No training data was collected
for this corpus since a substantial amount of NAB acoustic training
data was already available. Index files have been included that
specify the exact subset of the evaluation test recordings which were
used in the November 1995 tests. The software NIST used to process
and score the output of the tests systems is also included.
The data is organized as follows:
CD26-3 Development-Test Data-Location 1, Adaptation and NAB recordings,
Subjects:703-705, 707-70a, 70c, 70f, 70g
CD26-4 Development-Test Data-Location 2, NAB recordings,
Subjects:70k, 70m, 70o, 70q-70s, 70u-70w
CD26-5 Development-Test Data-Location 2, Adaptation recordings,
Subjects:70k 70m-70o, 70q-70s, 70u-70w
CD26-3 Development-Test Data-NAB recordings,
As of September, 2007 this publication has been condensed to fit on a single DVD. The data on each CD resides in its own directory labeled with the above NIST labels.
The Reduced Licensing Fee for this corpus is US$200.