Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



1997 Mandarin Broadcast News Transcripts (HUB4-NE)

Item Name: 1997 Mandarin Broadcast News Transcripts (HUB4-NE)
Authors: Shudong Huang, Jing Liu, Xuling Wu, Lei Wu, Yongmin Yan, and Zhoakai Qin
LDC Catalog No.: LDC98T24
ISBN: 1-58563-126-4
Data Type: text
Data Source(s): broadcast news
Project(s): EARS, GALE, Hub4
Application(s): speech recognition
Language(s): Mandarin Chinese
Distribution: Web Download
Member fee: $0 for 1998 members
Non-member Fee: N/A (Members Only)
Reduced-License Fee: N/A
Extra-Copy Fee: N/A
Member License: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: Shudong Huang, et al.
1998
1997 Mandarin Broadcast News Transcripts (HUB4-NE)
Linguistic Data Consortium, Philadelphia

LDC98S73 - Speech data
LDC98T24 - Transcripts

Introduction

This collection consists of 30 hours of recorded broadcasts and transcripts that have been drawn from the following sources:

Voice of America (VOA): United States Information Agency Radio
People's Republic of China Television (CCTV)
Commercial radio based in Los Angeles, CA. (KAZN-AM)

Of these three sources, the first two comprise the bulk of the collection and are represented in roughly equal amounts; only a relatively small sample of KAZN-AM recordings are included, owing to the relatively high proportion of unusable material (commercials, local traffic reports loaded with California place names, etc.).

Data

The transcripts were created by native speakers of Mandarin working at the LDC; they are in GB-encoded form, with SGML tagging to identify story boundaries, speaker turn boundaries and phrasal pauses; these tags include time stamps to align the text with the speech data. Word segmentation (white-space between words) is included. A working DTD is provided, and the markup is consistent with that of the 1997 English and Spanish HUB4 collections.

Updates

There are no updates at this time.

Copyright

Portions © 1997 China Central TV, © 1997 MultiCultural Broadcasting Corporation, © 1997, 1998 Trustees of the University of Pennsylvania

Pricing

The Reduced Licensing Fee for this corpus is US$100.


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact: ldc@ldc.upenn.edu

(c) 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.