Introduction
The HARD 2004 Topics and Annotations Corpus was produced by
Linguistic Data Consortium
(LDC), catalog number LDC2005T29 and ISBN 1-58563-373-9. This corpus
contains topics and annotations (clarification forms, responses and
relevance assessments) for the 2004 TREC HARD (High Accuracy
Retrieval
from Documents) Evaluation. HARD 2004 was a track within the NIST
Text REtrieval Conference (TREC), with the objective of achieving high
accuracy retrieval from documents by leveraging additional information
about the searcher and/or the search context, through techniques like
passage retrieval and the use of targeted interaction with the
searcher.
The current corpus was previously
distributed to HARD Participants as LDC2004E42 and LDC2005E17. The
source data that corresponds to this release is distributed as
LDC2005T28, HARD 2004 Text. This corpus was created with support
from
the DARPA TIDES
Program and LDC.
Data
Three major annotation tasks are represented in this release: Topic
Creation, Clarification Form Responses, and Relevance
Assessment. Topics include a short title, query plus
context, and a number of limiting parameters known as "metadata" which
include targeted geographical region, target data domain or genre, and
level of searcher expertise. Clarification Forms are brief HTML
questionnaires system developers submitted to LDC searchers to glean
additional information about information needs directly from the topic
creators. Relevance assessment consisted of adjudication of
pooled system responses, and included document-level judgments for all
topics, and passage-level relevance judgments for a subset of topics.
The release is divided into training and evaluation resources.
The training set comprises twenty-one topics and 100 document-level
relevance judgments per topic. The evaluation set contains fifty
topics, clarification forms and responses, document-level relevance
assessment for all topics and passage-level judgments for half of the
topics. HARD participants received the reference data over the course
of the evaluation cycle in stages: (0) training topics, (1) evaluation
topic descriptions without metadata, (2) clarification form responses,
(3) topic descriptions with metadata, and (4) relevance assessments.
For more information please consult the HARD Project
website.
Samples
For an example of the data in this publication, please review the following samples:
Content Copyright
© 2005 Trustees of the University of Pennsylvania. |