WordFreak how-to for named entity taggers

(draft, 2003-02-09)

Setting up

Files

Decide where you want to keep the files you're annotating. If you're running WordFreak on your own machine, two reasonable options are in the same directory as WordFreak itself, or in a subdirectory of that directory. If you're running WordFreak on a machine in the IRCS suite, you'll need to have an account of your own to save the files on; the local files on these machines are considered temporary and may be wiped at night. When you take a file to work on, check it out (we still have to develop that procedure) and put a copy in your files directory.

Open the project

There's a project view and a text view. Which one you see is controlled by the tabs on top of the main pane in the WordFreak window. For this, use the project view.

Then select the project from the display.

Add a file to the project

The file will appear in the project view. WordFreak will ask if you want to create an annotation file; answer Yes.

Load the file

A green checkmark will appear on the icon of the annotation file in the project view.

If you have more than one file loaded, switch to this one with

Pre-tagging

(WordFreak uses the name "tagging" to refer to work done automatically by programs that it calls, and "annotation" to refer to work done by a human annotator. That distinction isn't always necessary or made in other contexts.)

Tag the text for paragraphs, sentences, and tokens, in that order. We intend to automate this task, but for now you will have to do it semi-manually, telling WordFreak to use its taggers for these types of tagging.

First, select the file in the project view if it isn't already selected.

Paragraph tagging

This isn't a complex task, but most of the operation is the same for all three types of tagging, so I'll go into considerable detail here.

Use the > and < buttons to show each tagged paragraph in turn. There should be no problem with the paragraph tagging; it's a pretty straightforward task for the tagging program. The text may include some XML labels in angle brackets, like "<ABSTRACT>", and the highlighting may not include those; that's all right. The highlighting may or may not also include the blank line between paragraphs, and that's all right too.

(When you have more than one file loaded, if you're at the beginning or end of one of them, the Chooser > and < buttons will move you to the previous or next file. You can also move between them directly with Annotation | Go To .)

What to do if the tagging is wrong? Suppose two paragraphs are highlighted together as a single paragraph. The easiest way to fix this is in two steps: remove and add. (I'm talking about removing and adding tags in the Chooser, not removing and adding files in the main WordFreak window!)

  1. Remove Tag: With the mistagged section highlighted, click the button in the Chooser. The highlighting will disappear.
  2. Add Tags:
    1. Drag the cursor over the first of the mistakenly combined paragraphs. Be sure to get it all, including the period at the end. You may notice that WordFreak will not let you extend a paragraph selection into a part of the text that already is tagged. (This is also true for sentences and tokens, but not named entities; we'll discuss that in its place). Again, it's OK if you catch an extra blank line.
      -- If you have trouble getting the selection to work at the beginning or end of a paragraph, start dragging from the second or third character in, and after you've selected most of the paragraph you can use the second row of buttons (shrink and grow) to adjust the ends of the selection.
      -- When the paragraph is selected, click the "paragraph" button in the Chooser. (The + button would also work here, but the situation is more complicated with other types of tagging, so it's best to make a habit of using the labeled button.)
    2. Do the same for the second paragraph.

Check your work by clicking < and > to be sure that the highlighting is correct. If it's off by just a little, you can use shrink and grow; and, as always, you can ignore the space between paragraphs. When the paragraphs are correct, return to the project view and go on to sentences.

Note on clicking vs. dragging

You're probably used to applications, like word processors, in which a mouse click in text sets an insertion cursor so you can start typing or editing at that point. But in WordFreak you can't type or edit, so there is no insertion cursor. Instead, a mouse click selects the nearest tagged entity (of any of the types currently shown in the Chooser window). In order to select any text in WordFreak you have to drag the mouse at least a little bit.

Sentence tagging

Token tagging

Roughly speaking, a token is a single word, number, or punctuation mark.

And that finishes the pre-tagging. Now you can get to the meat of your work, the named entity annotation.

Biomedical named entity annotation

We don't have a tagger for this yet, which is why we need your work.

2003-02-09