Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



Mandarin Chinese News Text

Item Name: Mandarin Chinese News Text
Authors: Zhibiao Wu
LDC Catalog No.: LDC95T13
ISBN: 1-58563-052-7
Data Type: text
Data Source(s): newswire
Project(s): EARS, GALE, TIDES, Tipster, TREC
Application(s): information retrieval, language modeling
Language(s): Mandarin Chinese
Distribution: 1 CD
Member fee: $0 for 1995, 1996, 1997 members
Non-member Fee: US $500.00
Reduced-License Fee: US $250.00
Extra-Copy Fee: US $150.00
Non-member License: yes
Member License: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: Zhibiao Wu
1995
Mandarin Chinese News Text
Linguistic Data Consortium, Philadelphia

The Linguistic Data Consortium (LDC) announces the availability of a Mandarin Chinese text corpus. This corpus includes about 250 million GB-encoded text characters.

The Mandarin News Corpus includes text from various journalistic sources:

  • newspaper text from Renmin Ribao (People's Daily)
  • radio scripts from China Radio International
  • newswire text from Xinhua newswire service
The format of this corpus uses a labeled bracketing, expressed in the style of SGML (Standard Generalized Markup Language). The header fields provided by the sources, which give information such as topic, date and article ID, have been retained. The articles cover a variety of topics, including international and domestic news, sports and culture.

Content Copyright


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact: ldc@ldc.upenn.edu

(c) 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.