Introduction
ModeS TimeBank 1.0 was developed by researchers at Technical
University of Madrid and Barcelona
Media and is a corpus of Modern Spanish (17th and 18th centuries) annotated
with temporal and event information according to TimeML mark-up and annotated
with spatial information following the SpatialML scheme.
TimeML (Pustejovsky et al., 2005) is a specification language for annotating
eventualities and time expressions in natural language as well as the temporal
relations among them, thus facilitating the task of extraction, representation
and exchange of temporal information. SpatialML (Mani et al., 2008) is a specification
language for annotating and normalizing spatial expressions by means of geographic
coordinates.
LDC has released the following corpora incorporating TimeML or SpatialML annotation: TimeBank 1.2 LDC2006T08,
FactBank 1.0 LDC2009T23, ACE 2005 English SpatialML
Annotations Version 2 LDC2011T02 and ACE 2005 Mandarin SpatialML Annotations LDC2010T09.
Data
ModeS TimeBank 1.0 contains 102 documents reporting a sea-crossing cruise by
a ship called La Princesa, which took place from December 1768 to April
1769. There exist copious logbooks from that period that not only provide information
about shipping routes, but also contain valuable data concerning information
flows, commercial agents and social networks. The original corpus manuscript
is preserved in the Archivo General de Indias ("General Archive
of the Indies") and is available online at the Portal
de Archivos Espaňoles. This corpus was created within the framework of the
DynCoopNet project (Dynamic Compatibility
of Cooperation-Based Self-Organizing Networks in the First Global Age) which
is focused on the study of trade network cooperation during the 15th-19th centuries
and incorporates into its work maps, charts, databases and natural language
documents.
All text is encoded in UTF-8. The data in ModeS TimeBank 1.0 has been tokenized,
POS-tagged, and annotated with space, time and event information according to
the TimeML and SpatialML specification schemes. More specifically, the entities
annotated in the corpus are the following:
- Events: (tag EVENT, from TimeML). These include finite and non-finite verbal constructions,
nominalizations, nouns, adjectives and prepositional phrases.
- Temporal expressions (tag TIMEX3, from TimeML). These includeg expressions
of dates, times, durations and frequencies, both precise and vague.
- Spatial expressions (tag PLACE, from SpatialML). These are used for proper
and common nouns, adjectives, adverbs or spatial coordinates.
Samples
Please see the following links for examples of
annotated and
original texts.
Updates
None at this time.
Content Copyright
Portions © 2012 Marta Guerrero Nieto, Roser Sauri, © 2012 Trustees
of the University of Pennsylvania
|