The objective of the Automatic Content Extraction (ACE) Program was to develop extraction technology to support automatic processing of source language data (in the form of natural text and as text derived from ASR and OCR). Automatic processing, defined at that time, included classification, filtering, and selection based on the language content of the source data, i.e., based on the meaning conveyed by the data. Thus the ACE program required the development of technologies that automatically detect and characterize this meaning. The ACE research objectives were viewed as the detection and characterization of Entities, Relations, and Events.
LDC developed annotation guidelines, corpora and other linguistic resources to support the ACE Program. Some of these resources were developed in cooperation with the TIDES Program in support of TIDES Extraction evaluations.
ACE annotators tagged broadcast transcripts, newswire and newspaper data in English, Chinese and Arabic, producing both training and test data for common research task evaluations. There were three primary ACE annotation tasks corresponding to the three research objectives: Entity Detection and Tracking (EDT), Relation Detection and Characterization (RDC) and Event Detection and Characterization (EDC). A fourth annotation task, Entity Linking (LNK), grouped all references to a single entity and all its properties together into a Composite Entity.
Entity Detection and Tracking (EDT) was the core annotation task, providing the foundation for all remaining tasks. Later ACE tasks identified seven types of entities: Person, Organization, Location, Facility, Weapon, Vehicle and Geo-Political Entity (GPEs). Each type was further divided into subtypes (for instance, Organization subtypes include Government, Commercial, Educational, Non-profit, Other). Annotators tagged all mentions of each entity within a document, whether named, nominal or pronominal. For every mention, the annotator identified the maximal extent of the string that represents the entity and labeled the head of each mention. Nested mentions were also captured. Each entity was classified according to its type and subtype and was further tagged according to its class - specific, generic, attributive, negatively quantified or underspecified. During the LNK annotation task, annotators reviewed the entire document in order to group mentions of the same entity together; they also labeled cases of metonymy, where the name of one entity is used to refer to another entity (or entities) related to it.
Relation Detection and Characterization (RDC) involved the identification of relations between entities. This task was added in Phase 2 of ACE. RDC targeted physical relations including Located, Near and Part-Whole; social/personal relations including Business, Family and Other; a range of employment or membership relations; relations between artifacts and agents (including ownership); affiliation-type relations like ethnicity; relationships between persons and GPEs like citizenship; and finally discourse relations. For every relation, annotators identified two primary arguments (namely, the two ACE entities that are linked) as well as the relation's temporal attributes. Relations that were supported by explicit textual evidence were distinguished from those that depended on contextual inference on the part of the reader.
ACE Phase 3 added a new challenge: Event Detection and Characterization (EDC). In EDC, annotators identified and characterized five types of events in which EDT entities participate. Targeted types included Interaction, Movement, Transfer, Creation and Destruction events. Annotators tagged the textual mention or anchor for each event, and categorized it by type and subtype. They further identified event arguments (agent, object, source and target) and attributes (temporal, locative as well as others like instrument or purpose) according to a type-specific template.
In later phases of ACE, annotators identified additional event types as well as characterized relations between events.