|

|
|
1996 English Broadcast News Dev and Eval (HUB4)
| |
| Item Name: | 1996 English Broadcast News Dev and Eval (HUB4) |
| Authors: | David Graff, Jennifer Alabiso, Jon Fiscus, John Garofolo, William Fisher, and David Pallett |
| LDC Catalog No.: | LDC97S66 |
| ISBN: | 1-58563-108-6 |
| Data Type: | speech |
| Sample Rate: | 16000 Hz |
| Sampling Format: | 1-channel pcm |
| Data Source(s): | broadcast news |
| Project(s): | EARS, GALE, Hub4 |
| Application(s): | speech recognition |
| Language(s): | English |
| Language ID(s): | eng |
| Distribution: | 1 DVD |
| Member fee: | $0 for 1997, 1998 members |
| Non-member Fee: | N/A (Members Only) |
| Reduced-License Fee: | N/A |
| Extra-Copy Fee: | US $200.00 |
| Member License: | yes |
| Online documentation: | yes |
| Licensing Instructions: | Subscription Members, Standard Members, Non-Members |
| Citation: | David Graff, et al. 1997 1996 English Broadcast News Dev and Eval (HUB4) Linguistic Data Consortium, Philadelphia |
|
LDC97S44 - Speech data
LDC97S66 - Dev and eval
LDC97T22 - Transcripts
Introduction
The 1996 Broadcast News Speech Corpus contains a total of
104 hours of broadcasts from ABC, CNN and CSPAN television
networks and NPR and PRI radio networks with corresponding
transcripts. The primary motivation for this collection is
to provide training data for the DARPA "HUB4" Project on
continuous speech recognition in the broadcast domain.
Data
The speech files are available in a 19 disc training data set
with one additional disc of development data and an
additional disc of evaluation data. The following programs
are represented in this corpus:
ABC Nightline
ABC World Nightly News
ABC World News Tonight
CNN Early Edition
CNN Early Prime News
CNN Headline News
CNN Prime Time News
CNN The World Today
CSPAN Washington Journal
NPR All Things Considered
NPR Marketplace
Transcripts have been made of all recordings in this
publication, manually time aligned to the phrasal level,
annotated to identify boundaries between news stories,
speaker turn boundaries, and gender information about the
speakers. The released version of the transcripts is in SGML
format and there is accompanying documentation and an SGML
DTD file, included with the transcription release. The
transcripts are available via FTP.
Updates
There are no updates at this time.
Copyright
Pricing
The Reduced Licensing Fee for this corpus is US$200. |
|
|