Introduction
ACE 2005 Mandarin SpatialML Annotations was developed
by researchers at The MITRE
Corporation (MITRE). ACE 2005 Mandarin SpatialML Annotations
applies SpatialML tags to a subset of the source Mandarin training
data in ACE 2005 Multilingual Training Corpus
(LDC2006T06). Annotations for entities, relations, and events, which
were included in ACE 2005 Multilingual Training Corpus, are not
included in the current SpatialML release. For SpatialML markup to
ACE 2005 English data,
see ACE
2005 English SpatialML Annotations (LDC2008T03).
SpatialML is a mark-up language for representing spatial
expressions in natural language documents. SpatialML focuses is on
geography and culturally-relevant landmarks, rather than biology,
cosmology, geology, or other regions of the spatial language
domain. The goal is to allow for better integration of text
collections with resources such as databases that provide spatial
information about a domain, including gazetteers, physical feature
databases and mapping services.
The ACE (Automatic Content Extraction) Program seeks to develop
extraction technology to support automatic processing of source
language data (in the form of natural text, and as text derived from
automatic speech recognition and optical character
recognition). This includes classification, filtering, and selection
based on the language content of the source data, i.e., based on the
meaning conveyed by the data. Thus the ACE program requires the
development of technologies that automatically detect and
characterize this meaning. The annotation efforts of the ACE program
supports the development of automatic content extraction technology
to support automatic processing of human language in text form. The
kind of information recognized and extracted from text includes
entities, values, temporal expressions, relations and events
The SpatialML annotation scheme is intended to emulate earlier
progress on time expressions such
as TIMEX2, TimeML,
and
the 2005
ACE guidelines. The main SpatialML tag is the PLACE tag which
encodes information about location. The central goal of SpatialML is
to map location information in text to data from gazetteers and
other databases to the extent possible by defining attributes in the
PLACE tag. Therefore, semantic attributes such as country
abbreviations, country subdivision and dependent area abbreviations
(e.g., US states), and geo-coordinates are used to help establish
such a mapping. LINK and PATH tags express relations between places,
such as inclusion relations and trajectories of various
kinds. Information in the tag along with the tagged location string
should be sufficient to uniquely determine the mapping, when such a
mapping is possible. This also means that redundant information is
not included in the tag. To the extent possible, SpatialML leverages
ISO and other standards towards the goal of making the scheme
compatible with existing and future corpora. The SpatialML
guidelines are compatible with existing guidelines for spatial
annotation and existing corpora within the ACE research program.
Data
This corpus consists of a 298-document subset of broadcast material
from the ACE 2005 Multilingual Training Corpus (LDC2006T06) that has
been tagged by a native Mandarin speaker according to version 2.3 of
the SpatialML annotation guidelines, which are included in the
documentation for this release.
Updates
No updates have been issued at this time.
Content Copyright
Portions © 2000-2001 China Broadcasting System, © 2000-2001 China Central TV, © 2000-2001 China National Radio, © 2000-2001 China Television System, © 2008-2009 The MITRE Corporation, © 2005, 2006, 2010 Trustees of the University of Pennsylvania |