Introduction
NIST 2004 Open Machine Translation (OpenMT) Evaluation, is a
package containing source data, reference translations, and scoring
software used in the NIST 2004 OpenMT evaluation. It is designed to
help evaluate the effectiveness of machine translation systems. The
package was compiled and scoring software was developed by
researchers at NIST, making use of newswire source data and
reference translations collected and developed by LDC.
The objective of the NIST OpenMT evaluation series is to support
research in, and help advance the state of the art of, machine
translation (MT) technologies -- technologies that translate text
between human languages. Input may include all forms of text. The
goal is for the output to be an adequate and fluent translation of
the original.
The MT evaluation series started in 2001 as part of the DARPA TIDES
(Translingual Information Detection, Extraction) program. Beginning
with the 2006 evaluation, the evaluations have been driven and
coordinated by NIST as NIST OpenMT. These evaluations provide an
important contribution to the direction of research efforts and the
calibration of technical capabilities in MT. The OpenMT evaluations
are intended to be of interest to all researchers working on the
general problem of automatic translation between human languages. To
this end, they are designed to be simple, to focus on core
technology issues, and to be fully supported. The 2004 task was to
evaluate translation from Chinese to English and from Arabic to
English.
Additional information about these evaluations may be found at
the NIST Open
Machine Translation (OpenMT) Evaluation web site.
Scoring Tools
This evaluation kit includes a single Perl script (mteval-v11a.pl)
that may be used to produce a translation quality score for one (or
more) MT systems. The script works by comparing the system output
translation with a set of (expert) reference translations of the same
source text. Comparison is based on finding sequences of words in the
reference translations that match word sequences in the system output
translation. More information on the evaluation algorithm may be
obtained from the paper detailing the
algorithm: BLEU:
a Method for Automatic Evaluation of Machine Translation (Papineni et
al, 2002).
The included scoring script was released with the original
evaluation, intended for use with SGML-formatted data files, and is
provided to ensure compatibility of user scoring results with results
from the original evaluation. An updated scoring software package
(mteval-v13a-20091001.tar.gz), with XML support, additional options
and bug fixes, documentation, and example translations, may be
downloaded from
the NIST Multimodal
Information Group Tools website.
Data
This corpus consists of 150 Arabic newswire documents, 150 Chinese
newswire documents, and 29 Chinese "prepared speech"
documents, and a corresponding set of four separate human expert
reference translations. Because LDC lacks permission to publicly
distribute some of the source text used in the original evaluation,
all 50 Arabic "prepared speech" documents and 21 of 50
Chinese "prepared speech" documents (and their corresponding
reference translations) have been removed from the current
release.
The reference translations included in this corpus have not
previously been publicly available. Some of the source text in this
corpus has been publicly released as part of other LDC publications,
including Arabic
Gigaword Second Edition, LDC2006T02 (Agence France-Presse (AFP)
and Xinhua News Agency
(Xinhua)); Chinese
Gigaword Second Edition, LDC2005T14 (Xinhua, and Zaobao News
Agency); Chinese
Gigaword Third Edition, LDC2007T38 (AFP);
and Hong
Kong Parallel Text, LDC2004T08 (Hong Kong Special Administrative
Region).
The source text included in this corpus was collected from the
following sources:
Arabic
| DocID prefix | Source | Date | Document count |
| AFA | Agence France-Presse | Jan. 2004 | 50 |
| ALH | Al Hayat | Jan.-Mar. 2004 | 25 |
| ANN | An Nahar | Feb. 2004-Mar. 2004 | 25 |
| XIN | Xinhua News Agency | Jan. 2004 | 50 |
Chinese
| DocID prefix | Source | Date | Document count |
| AFC | Agence France-Presse | Jan. 2004 | 50 |
| HKN | Hong Kong Special Administrative Region | Jan.-Mar. 2003 | 16 |
| PD | People's Daily | Apr. 2003-Mar. 2004 | 34 |
| XIN | Xinhua News Agency | Oct. 2002-Jan. 2004 | 53 |
| ZBN | Zao Bao News Agency | Sept. 2003-Mar. 2004 | 26 |
For each language, the test set consists of two files: a source and
a reference file. Each reference file contains four independent
translations of the data set. The evaluation year, source language,
test set (which, by default, is "evalset"), version of the
data, and source vs. reference file (with the latter being indicated
by "-ref") are reflected in the file name.
DARPA TIDES MT and NIST OpenMT evaluations used SGML-formatted test
data until 2008 and XML-formatted test data thereafter. The files in
this package are provided in both formats.
Sample
Sample text file containing excerpts from different xml files included in this corpus, including reference translations and source text for a single newswire document. The file is encoded in UTF-8.
Updates
There are no updates available at this time.
Content Copyright
Portions © 2004 Agence France-Presse, © 2004 Al Hayat, ©
2004 An Nahar, © 2003-2004 People's Daily, © 2003-2004 SPH
AsiaOne, Ltd., © 2003 The Government of the Hong Kong Special
Administrative Region, © 2002-2004 Xinhua News Agency, © 2004,
2005, 2006, 2007, 2010 Trustees of the University of Pennsylvania. |