Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



North American News Text Corpus

Item Name: North American News Text Corpus
Authors: David Graff
LDC Catalog No.: LDC95T21
ISBN: 1-58563-053-5
Data Type: text
Data Source(s): newswire
Project(s): EARS, GALE, Hub4, MUC, TIDES
Application(s): information retrieval, language modeling
Language(s): English
Language ID(s): eng
Distribution: 1 DVD
Member fee: $0 for 1995, 1996, 1997 members
Non-member Fee: N/A (Members Only)
Reduced-License Fee: N/A
Extra-Copy Fee: US$300.00
Member License: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: David Graff
1995
North American News Text Corpus
Linguistic Data Consortium, Philadelphia

The North American News Text corpus is composed of news text that has been formatted using TIPSTER-style SGML markup. The text is taken from the following sources:

Source			   Dates 	  Approx. # Words 
			   Covered	  (Millions)
-------------------------------------------------------
Los Angeles Times &	   05/94-08/97	   52
  Washington Post

New York Times News	   07/94-12/96	  173
  Syndicate

Reuters News Service	   04/94-12/96	   85
  (General & Financial)

Wall Street Journal	   07/94-12/96	   40
-------------------------------------------------------

Both the New York Times and the L. A. Times/Washington Post services actually include a range of other newspaper sources in their syndicated newswires. The L. A. Times/Washington Post material will be found to include the following sources (in lesser amounts) in addition to the two predominant sources:

  • Newsday
  • The Baltimore Sun
  • The Hartford Courant

The New York Times material will be found to contain the following sources (in lesser amounts), but N.Y. Times articles predominate:

  • Bloomberg Business News
  • The Boston Globe
  • Los Angeles Daily News
  • Fort Worth Star-Telegram
  • Newsweek
  • Cox News Service
  • The Arizona Republic
  • Seattle Post-Intelligencer
  • San Francisco Examiner
  • Houston Chronicle
  • San Francisco Chronicle
  • Economist Newspaper Ltd.
  • Hearst Newspapers

Both of these newswire services also include small numbers of articles from a larger set of miscellaneous sources. The ones listed above appear with some frequency on a daily basis.

Copyright

Portions © 1994-1996 Dow Jones & Company, Inc., © 1994-1997 Los Angeles Times-Washington Post News Service, Inc., © 1994-1996 New York Times, © 1994-1996 Reuters America, Inc., © 1995-1997 Trustees of the University of Pennsylvania.

Pricing

The Reduced Licensing Fee for this corpus is US$300.

Content Copyright


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Da ta

Contact: ldc@ldc.upenn.edu

(c) 1992-2008 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.