|

|
|
1996 English Broadcast News Transcripts (HUB4)
| |
| Item Name: | 1996 English Broadcast News Transcripts (HUB4) |
| Authors: | David Graff and Jennifer Alabiso |
| LDC Catalog No.: | LDC97T22 |
| ISBN: | 1-58563-149-3 |
| Data Type: | text |
| Data Source(s): | broadcast news |
| Project(s): | EARS, GALE, Hub4 |
| Application(s): | speech recognition |
| Language(s): | English |
| Language ID(s): | ENG |
| Distribution: | Web Download |
| Member fee: | $0 for 1997, 1998 members |
| Non-member Fee: | N/A (Members Only) |
| Reduced-License Fee: | N/A |
| Extra-Copy Fee: | N/A |
| Member License: | yes |
| Online documentation: | yes |
| Licensing Instructions: | Subscription Members, Standard Members, Non-Members |
| Citation: | David Graff and Jennifer Alabiso 1997 1996 English Broadcast News Transcripts (HUB4) Linguistic Data Consortium, Philadelphia |
|
LDC97S44 - Speech data
LDC97S66 - Dev and eval
LDC97T22 - Transcripts
Introduction
The 1996 Broadcast News Speech Corpus contains a total of
104 hours of broadcasts from ABC, CNN, and CSPAN television
networks and NPR and PRI radio networks with corresponding
transcripts. The primary motivation for this collection is
to provide training data for the DARPA "HUB4" Project on
continuous speech recognition in the broadcast domain. The
speech files are available in a 19 disc training data set
with one additional disc of development data and an
additional disc of evaluation data. The following programs
are represented in this corpus:
ABC Nightline
ABC World Nightly News
ABC World News Tonight
CNN Early Edition
CNN Early Prime News
CNN Headline News
CNN Prime Time News
CNN The World Today
CSPAN Washington Journal
NPR All Things Considered
NPR Marketplace
Data
Transcripts have been made of all recordings in this
publication, manually time aligned to the phrasal level,
annotated to identify boundaries between news stories,
speaker turn boundaries and gender information about the
speakers. The released version of the transcripts is in SGML
format and there is accompanying documentation and an SGML
DTD file, included with the transcription release. The
transcripts are available via FTP.
Updates
There are no updates at this time.
Copyright
Pricing
The Reduced Licensing Fee for this corpus is US$100. |
|
|