(draft, 2003-02-09)
Decide where you want to keep the files you're annotating. If you're running WordFreak on your own machine, two reasonable options are in the same directory as WordFreak itself, or in a subdirectory of that directory. If you're running WordFreak on a machine in the IRCS suite, you'll need to have an account of your own to save the files on; the local files on these machines are considered temporary and may be wiped at night. When you take a file to work on, check it out (we still have to develop that procedure) and put a copy in your files directory.
There's a project view and a text view. Which one you see is controlled by the tabs on top of the main pane in the WordFreak window. For this, use the project view.
A green checkmark will appear on the icon of the annotation file in the project view.
If you have more than one file loaded, switch to this one with
(WordFreak uses the name "tagging" to refer to work done automatically by programs that it calls, and "annotation" to refer to work done by a human annotator. That distinction isn't always necessary or made in other contexts.)
Tag the text for paragraphs, sentences, and tokens, in that order. We intend to automate this task, but for now you will have to do it semi-manually, telling WordFreak to use its taggers for these types of tagging.
First, select the file in the project view if it isn't already selected.
This isn't a complex task, but most of the operation is the same for all three types of tagging, so I'll go into considerable detail here.
(NOTE: As you switch between WF and other applications, you may find that the main WF window is on top but the Chooser window is hidden behind other application windows. You can bring it forward with Window | Bring all to front.)
| icon | function | tooltip |
| first row | ||
| < | previous tagged entity | left |
| > | next tagged entity | right |
| + | tag selection as entity | add |
| – | untag selected entity | remove |
| second row | ||
| <=| | extend beginning of selection | grow left |
| >=| | contract beginning of selection | shrink left |
| |=< | contract end of selection leftwards | grow right |
| |=> | extend end of selection rightwards | shrink right |
Use the > and < buttons to show each tagged paragraph in turn. There should be no problem with the paragraph tagging; it's a pretty straightforward task for the tagging program. The text may include some XML labels in angle brackets, like "<ABSTRACT>", and the highlighting may not include those; that's all right. The highlighting may or may not also include the blank line between paragraphs, and that's all right too.
(When you have more than one file loaded, if you're at the beginning or end of one of them, the Chooser > and < buttons will move you to the previous or next file. You can also move between them directly with Annotation | Go To .)
What to do if the tagging is wrong? Suppose two paragraphs are highlighted together as a single paragraph. The easiest way to fix this is in two steps: remove and add. (I'm talking about removing and adding tags in the Chooser, not removing and adding files in the main WordFreak window!)
Check your work by clicking < and > to be sure that the highlighting is correct. If it's off by just a little, you can use shrink and grow; and, as always, you can ignore the space between paragraphs. When the paragraphs are correct, return to the project view and go on to sentences.
Note on clicking vs. dragging
You're probably used to applications, like word processors, in which a mouse click in text sets an insertion cursor so you can start typing or editing at that point. But in WordFreak you can't type or edit, so there is no insertion cursor. Instead, a mouse click selects the nearest tagged entity (of any of the types currently shown in the Chooser window). In order to select any text in WordFreak you have to drag the mouse at least a little bit.
Roughly speaking, a token is a single word, number, or punctuation mark.
And that finishes the pre-tagging. Now you can get to the meat of your work, the named entity annotation.
2003-02-09