Chaining in WordFreak

Annotators' home



How to Chain

  1. We now have a decent way to annotate split references, a.k.a. discontinuous expressions. These are strings like

    Use the chain tool only in entity annotation, and only for split references, except in info-gathering mode. See "What to Chain", below.

    Previously we had to tag the parts of a split reference (in the examples, "K-" and "ras", and "CYP1A" and "2") separately and note their relationship in the Comment field (archive), which is not accessible to the machine learning algorithms or the data retrieval software and only serves as a guide for other annotators in subsequent stages. But now we can make the connection explicit in the actual annotation by building a chain of strings and applying a single tag to the whole chain.

  2. Building a chain:
    1. Drag the mouse pointer to select the text for the first (leftmost) link.

      (NOTE: WF will allow you to specify links in any order, but don't do it! That will mess up some other features that Eric and Jeremy are working on, as well as untagging [#7, below].)

    2. Control-drag to select the next link: drag while holding down the "Ctrl" key (APPLE key on Macs).
    3. Repeat step (b) until you have selected all the links by control-dragging each one.
    4. Tag the chain, either with a Chooser button or with the right-click menu in the Viewer window.

  3. In most chaining situations, one of the references you need to tag will be an unbroken string; in the examples, "N-ras" and "CYP1A1". You don't need to make a chain of these. Tag like this (in this document "(+)" separates links of a chain):

    It wouldn't hurt anything to tag "N-ras" or "CYP1A1" as a chain, but it's a couple of extra steps. Save yourself the trouble.

  4. Selection, as distinct from highlighting (purple background), proceeds one link at a time. Even without chaining, the text of a selected string appears in normal black instead of the highlight color appropriate to the entity type.

    After tagging a chain and while its links are highlighted, look closely and you'll see that only one link's text is in black; the others' is in the entity color. The link with black text is the one selected. Move from link to link, either with the arrow keys or the Chooser buttons, and you'll see this color difference moving along, although all the links will remain highlighted as long as any one of them is selected. The Chooser's "grow" and "shrink" buttons will affect the link that is currently selected.

  5. As you create a chain, the character range displayed in the status line at the bottom of the viewer window will show the range for the current link; e.g., "1991..2000" for a selection that begins with character #1991 of the file and ends with character #1999 (that's the way range is defined for all tags). Once you tag the chain, the status line shows the whole range, with distinctive punctuation: "1991..  ..2014", with two pairs of dots.

  6. WordFreak will allow you to make a chain with some links in one sentence and some in another, but it shouldn't, just as it doesn't allow any single solid tag to span more than one sentence. Enforcing this restriction is on the to-do list, but not very high; Eric and Jeremy have more important things to do. Just don't do it.

  7. Untagging:
    To untag an entire chain, select the first link in the chain and untag it. If you untag any link but the first, only the selected link will be untagged and removed from the chain. -- "First" here means the link that you started creating the chain with. If you start a chain with any link but the leftmost one, you'll have a hard time if you need to correct it later, as will any subsequent annotator.

What to Chain

[2004-11-12] Unrestricted chaining can create insuperable problems for treebanking. Ann and Seth agree that chaining for split coordinations works well, but otherwise don't use it at all unless there is an explicit exception in the guidelines. (At present we have exactly one such exception.) If you see another situation where you think chaining would be appropriate, by all means ask, but don't do it until it's been approved.

This means that we only use it in entity annotation, not POS annotation, let alone paragraph and sentence/section. See archive.

In info-gathering mode, don't chain at all : continue to sweep out the whole text of the entity reference. As of March 2004 this applies only to Malignancy, in the oncology domain.

Embedded abbreviations

A string with an embedded abbreviation, like

may tempt you to use the chaining tool to skip the abbreviation, like so: But we won't, both for consistency with what we've already done and to stay away from creating tags that will cause problems for treebanking and propbanking.

This applies even to an abbreviation embedded in a split coordination:

We do want to chain "1A1", "1A2", and "1A3" with the left-hand part of the name, and that's right. The chaining tool would also let us skip over "(CYP)". However, in order to stay consistent with existing practice, we'll treat the abbreviation as embedded in each case:

Problems

(These are on Eric & Jeremy's list already.)

  1. When you change the tag on a chain, the text color changes only on the link that is selected (and you can't see that till you select something else). The other links' colors remain unchanged until you select them, or until the file is closed and reopened.

  2. You can't extend a chain. If you build a chain of two links and tag it, then realize that you meant to include another link, you have to untag the chain (Chooser '–' button, or Backspace key) and then build it again.



CHANGE NOTES

2004-11-10: Table of contents

Annotators' home

2004-11-12