Introduction
This publication contains the TREC Spanish Corpus produced by the Linguistic
Data Consortium (LDC); catalog number LDC2000T51, ISBN 1-58563-177-9. This is
the set of documents used for the Spanish task in TRECs 3-5. It consists of
approximately 250 megabytes of the Mexican newspaper El Norte and 300 megabytes
of Agence France Presse 1994 newswire text formatted to include TREC document
IDs. The El Norte documents were used for TRECs 3-4 and the Agence France
Presse documents were used for TREC 5. The topics (questions) and relevance judgments
(right answers) that complete the test collections can be downloaded from the TREC
web site in the Data/Non-English
section.
Data
Please look at file.tbl for the directory structure of
this publication, as well as a complete list of files.
The files in the afp_text and infosel_data subdirectories are ASCII encoded
SGML files that conform to the afp_trec.dtd and infosel.dtd files found in the doc
subdirectory.
Updates
There are no updates at this time.
Copyright |