Online dictionary project for endangered languages of the Southwest
Michael Hammond, Sonya Bird, Melody Jeffcoat
University of Arizona
With the threat of extinction facing so many Native American languages, a
push has begun in recent years to create electronic resources for these
languages as a tool for documenting and maintaining them, and also as a
means of ensuring their compatibility with the modern technological world.
We propose to discuss a project that is currently underway at the
University of Arizona to create online dictionaries for endangered
languages of the Southwest. We focus on the problem of encoding Navajo
verbs, and making their internal structure accessible to the interested
user.
So far, the dictionary project includes the following languages: Tohono
O'odham, Navajo, and Hiaki. Using XML as the markup language, we have come
up with a general template (dtd) consisting of hierarchies of tags, which
provides the necessary structure to encode, for any given language, the
relevant grammatical and lexical information. Using Java as a programming
environment, we are developing multiple interfaces to our dictionaries to
accommodate researchers, native speakers, and language students.
Two related questions that have arisen are (a) how to encode in
the lexicon morphologically complex words - as in verbs in Navajo, and (b)
how to construct the XML scheme so that the differing needs of our various
user populations can be met with a single dictionary object underlying the
different interfaces. In Navajo, verbs consist of a verb root preceded by
up to approximately 10 prefixes. For example, the verb dabidishn
translates to "I say it to them individually", and is composed of the verb
stem -n-, and 4 prefixes: da- ('them, individually'), bi- ('them'), di-
(adverbial prefix), and sh- ('I', present tense). Because of different
levels of linguistic awareness and interest in dissecting verbs in our
user population, the database must allow both for searching for complete
verb forms, and for searching for individual morphemes. We have used the
following method for encoding verbs (only the relevant pieces are
included):
dabidishn
I say it to them individually
distributive
3rd person
We use an XML parser that constructs appropriate programming objects based
on the tags above. Thanks to this parser, users interested in dissecting
verbs can search for component morphemes by accessing the objects created
from the tag. Those interested only in periphrastic
translations can search using only the objects created from the
tag.
-----
"gladya na mir, nel'zya ne udivlyat'sya"
-Koz'ma Prutkov
Sonya Bird
Department of Linguistics
University of Arizona
Douglass 200E
Tucson, AZ 85721
phone: (520)621-6897
sbird@u.arizona.edu