![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
||||
|
|
Lexicon The lexicon is edited manually on a general-purpose text editor, typographical and formatting errors can hardly be avoided. One way to check them is to sort the lexicon by each column and test it with simple checking commands and combinations of them. For example, the command awk 'BEGIN {FS = "\t"} NF != 7 {print}' prints all entries which do not have exactly seven columns, each separated by a tab. Perl and Awk are useful tools for sorting the data in various ways. For example, they can transpose the columns in any order, display only certain columns, display a row whose column X matches the pattern Y, etc. When there are several people working on a lexicon, it can be very difficult to keep entry creation and categorization consistent. It is therefore important to write down every principle they have agreed on, and keep all of them informed of changes and additions. It is also important to leave the decision for the chief lexicographer if there is any doubt. Ideally, a lexicographer should have some linguistic background as well as first-hand knowledge of the language. It often takes place that the changes each lexicographer adds to the lexicon are not properly reflected on the files other people are working on, or each lexicographer work on a different version of the lexicon. In order to avoid this, programs such as rcs - Revision Controlling System - are useful. Another strategy is to keep finished part in separate files, so that the lexicographers can focus their attention on unfinished materials. Make sure to name the older lexicon files in such a way as everyone can keep track of the history of the work. |
|||
|
About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data
Contact ldc@ldc.upenn.edu |
||||