| This corpus consists of newswire text
from Nihon Keizai Shimbun, Inc. (NIKKEI), the largest
Japanese daily financial newspaper, and Telerate, Inc.
(formerly known as Dow Jones/Kyodo News Service),
published primarily for managers of Japanese-owned
corporations or Japanese employees working in North
American financial institutions.
The Telerate portion constitutes all newswire text
collected by the LDC between December 1994 and September
1998. The Telerate data collected from June 1995 to
September 1998 serves as a supplement to the original
publication.
All NIKKEI data was collected from December 1993 to
November 1994 and is also available on the 1995 release
of the Japanese Business News Text.
The data, including SGML tags, breaks down as follows.
# of Files Daily Average Size Total Size
--------------------------------------------------
NIKKEI 364 514K 188MB
Telerate 1060 336K 357MB
The NIKKEI text was received on nine-track magnetic tape.
The original character encoding was EBCDIC, but was
converted to EUC encoding, which the LDC uses for its
Japanese publications.
The Telerate text was received via a digital
transmission service installed at the LDC by Telerate.
Custom software was written by the LDC to poll a central
database and download articles individually. The
character encoding is EUC.
LDC added SGML tags automatically in order to identify
individual stories within the daily collections.
Pricing
The Reduced Licensing Fee for this corpus is US$150. |