Introduction
This release of North American News Text provides a supplement to the
LDC's earlier publication of similar materials
(LDC95T21: North American News Text
Corpus). The same TIPSTER-style SGML markup is used in formatting
the data. The data sources are as follows:
Source Dates Approx. # Words
Covered (Millions)
-------------------------------------------------------
Los Angeles Times & 09/97-04/98 11
Washington Post
New York Times News 01/97-04/98 116
Syndicate
Associated Press 11/94-04/98 143
World Stream English
-------------------------------------------------------
The previous North American News release included prior materials from
both the LA Times/Washington Post and the New York Times; this
supplement provides the continuation of those sources.
Data
The LDC has been collecting the Associated Press Worldstream newswire
service in six languages since 1994. The is the first release of the
English language portion of this service. The material in this set is
typically NOT North American in origin -- the reporters who provide
the stories may or may not be American born, but the locations and
topics covered are much more heavily international in comparison to
the North American wire services. Reports from Asia, Africa and
Europe are found here that show up only rarely or not at all in North
American newspapers, including political, financial and sports stories
that are presumably geared to English-speaking readers in those parts
of the world.
This release, when combined with the LDC's earlier NA News Text
Corpus, constitutes all the English-language newswire text collected
by the LDC between January 1994 and April 1998, inclusive.
Updates
There are no updates at this time.
Copyright
Portions © 1994-1998 The Associated Press, © 1997-1998 Los Angeles Times - Washington Post News Service, Inc., © 1997-1998 New York Times, © 1998 Trustees of the University of Pennsylvania
Pricing
The Reduced Licensing Fee for this corpus is US$200. |