Prior to the creation of this publication, NIST had been distributing the text collection via ftp to TREC participants but the size of the collection made ftp transfer cumbersome. The LDC has moved the TREC Mandarin text collection to CD-ROM to simplify its distribution. This single CD publication contains an index.html file with links to the peoples-daily/ and xinhua/ directories containing newswire text and to a doc/ directory containing additional documents such as this file, file.tbl. The 36 files in the peoples-daily directory contain 139,801 documents; the 41 files in the xinhua directory contain 24,988 documents. The contents of the peoples-daily and xinhua directories are shown below. The files in the poeples-daily directory are named as pdYYMM.sgml where pd = People's Daily, YY = year, and MM = month. The files in the xinhua directory are named as xYYMMddd where x = Xinhua, YY = year, MM = month, and ddd = sequential order number. /index.html doc/ file.tbl pd.dtd pd-char-err.log pd-char-err.summary pd-missed-boundary.summary xh-char-err.log xh-char-err.summary xinhua.dtd peoples-daily/ pd9101.sgml pd9102.sgml pd9103.sgml pd9104.sgml pd9105.sgml pd9106.sgml pd9107.sgml pd9108.sgml pd9109.sgml pd9110.sgml pd9111.sgml pd9112.sgml pd9201.sgml pd9202.sgml pd9203.sgml pd9204.sgml pd9205.sgml pd9206.sgml pd9207.sgml pd9208.sgml pd9209.sgml pd9210.sgml pd9211.sgml pd9212.sgml pd9301.sgml pd9302.sgml pd9303.sgml pd9304.sgml pd9305.sgml pd9306.sgml pd9307.sgml pd9308.sgml pd9309.sgml pd9310.sgml pd9311.sgml pd9312.sgml xinhua/ x9404001 x9405001 x9406001 x9406002 x9406003 x9407001 x9407002 x9407003 x9408001 x9408002 x9408003 x9409001 x9409002 x9409003 x9410001 x9410002 x9410003 x9501001 x9501002 x9501003 x9502001 x9502002 x9502003 x9503001 x9503002 x9503003 x9504001 x9504002 x9504003 x9505001 x9505002 x9505003 x9507001 x9507002 x9507003 x9508001 x9508002 x9508003 x9509001 x9509002 x9509003