Online dictionary project for endangered languages of the Southwest Michael Hammond, Sonya Bird, Melody Jeffcoat University of Arizona With the threat of extinction facing so many Native American languages, a push has begun in recent years to create electronic resources for these languages as a tool for documenting and maintaining them, and also as a means of ensuring their compatibility with the modern technological world. We propose to discuss a project that is currently underway at the University of Arizona to create online dictionaries for endangered languages of the Southwest. We focus on the problem of encoding Navajo verbs, and making their internal structure accessible to the interested user. So far, the dictionary project includes the following languages: Tohono O'odham, Navajo, and Hiaki. Using XML as the markup language, we have come up with a general template (dtd) consisting of hierarchies of tags, which provides the necessary structure to encode, for any given language, the relevant grammatical and lexical information. Using Java as a programming environment, we are developing multiple interfaces to our dictionaries to accommodate researchers, native speakers, and language students. Two related questions that have arisen are (a) how to encode in the lexicon morphologically complex words - as in verbs in Navajo, and (b) how to construct the XML scheme so that the differing needs of our various user populations can be met with a single dictionary object underlying the different interfaces. In Navajo, verbs consist of a verb root preceded by up to approximately 10 prefixes. For example, the verb dabidishn translates to "I say it to them individually", and is composed of the verb stem -n-, and 4 prefixes: da- ('them, individually'), bi- ('them'), di- (adverbial prefix), and sh- ('I', present tense). Because of different levels of linguistic awareness and interest in dissecting verbs in our user population, the database must allow both for searching for complete verb forms, and for searching for individual morphemes. We have used the following method for encoding verbs (only the relevant pieces are included): dabidishn I say it to them individually
da
distributive
bi
3rd person
We use an XML parser that constructs appropriate programming objects based on the tags above. Thanks to this parser, users interested in dissecting verbs can search for component morphemes by accessing the objects created from the tag. Those interested only in periphrastic translations can search using only the objects created from the tag. ----- "gladya na mir, nel'zya ne udivlyat'sya" -Koz'ma Prutkov Sonya Bird Department of Linguistics University of Arizona Douglass 200E Tucson, AZ 85721 phone: (520)621-6897 sbird@u.arizona.edu