Kirrkirr: Experiences with a flexible
software interface to indigenous dictionaries

Christopher D. Manning, Stanford University

Depts of Computer Science and Linguistics, Stanford University
Gates Building 2A, 353 Serra Mall, Stanford CA 94305-9020, USA
manning@cs.stanford.edu | www.stanford.edu/~manning/


This presentation discusses work done together with:

Nitin Indurkhya, School of Applied Science, Nanyang Technological University, nitin@cs.usyd.edu.au
Kevin Jansz, Dept of Computer Science, University of Sydney, kjansz@cs.usyd.edu.au
Jane Simpson, Dept of Linguistics, University of Sydney, jhs@mail.usyd.edu.au


This is an overview of the goals, architecture, and usability of Kirrkirr, a Java-based visualization tool for XML dictionaries, currently being used with a dictionary for Warlpiri, an Australian Aboriginal language.

The goal of this work is somewhat different from that of most other projects at this workshop, in that a leading aim was to provide software that was usable by people other than tertiary-educated linguists. Within the Australian context, indigenous dictionary structure and usability has usually been dictated by professional linguists, while the needs of others (speakers, semi-speakers, young users, second language learners) are not met. Not only is such an approach rapidly becoming seen as unacceptable, but we believe that it is also undesirable as a field linguistics methodology. As in some of the famous American structuralist work, the best results will come from actively enlisting native speakers, and this requires appropriate tools. This is especially true when dealing with something as large as the lexicon of a language, or as subtle as its semantics.

A second goal was to make better use of computers for visualization, hypertext linking and multimedia in order to provide a richer experience of dictionary content. As potential users, we particularly focussed on school children, who are the largest group of native speakers with some literacy skills. Our goal is to provide a fun dictionary tool that is effective for browsing and incidental language learning, as well as for serious research, in part because indications are that current interfaces are unlikely to have much direct educational benefit for students (Kegl 1995). From this viewpoint, the low level of literacy in the region, and the inherently captivating nature of computers suggests that an e-dictionary is potentially more useful than a paper edition. Among other benefits, we can provide an interface less dependent on good knowledge of spelling and alphabetical order, by giving user supports.

A third background goal was to promote standards-based computing within descriptive linguistics. A leading reason for the lack of adequate tools for field linguistics has been the reliance on homespun software tools, which do not effectively interoperate with mainstream software. This has greatly restricted the ability of people to get functionality for free.

Our software design is general, but so far we have worked with Warlpiri, a language of Central Australia, for which there has been an extensive on-going project for the compilation of semantically-rich lexical materials since the 1950s (Laughren and Nash 1983, Hale and Laughren to appear). We converted this data from a non-standard format into a richly-structured XML version. Because the data had been maintained by hand in text editors for decades, this was actually surprisingly complex, due to an accumulation of structural errors, inconsistencies, typos, etc. The current version uses ad hoc indexing of the XML version for efficient access, but we expect to move to XQL, as this standard matures. Our system is written in Java, using the Swing API, and runs on all major platforms (Windows, Mac, Unix).

We exploit the structured XML representation by allowing our program to mediate between the lexical data and the user. The interface can select from and choose how to present information, in ways customized to a user's preferences and abilities. As well as presenting varying amounts of detail in textual views of entries (achieved dynamically using XSL), we are able to provide alternate views of dictionary information such as a color-coded network display of semantic links between words, which can be explored, manipulated and customized interactively by the user (Jansz et al. 1999). To augment traditional semantic relations in the dictionary, we provide also linkages derived automatically from collocational analysis (of the limited amount of online Warlpiri text), and present an interface derived from semantic domains. Beyond that, the dictionary includes multimedia elements, such as pictures and speech files, a note taking facility, and an advanced search pane (with regular expressions for hacker linguists!).

We have performed some preliminary trialing of the e-dictionary through visits by Mim Corris to Yuendumu and Willowra, and Jane Simpson to Lajamanu, and have done additional trialing on the usability of paper dictionaries of Aboriginal languages (Corris et al. 1999). This has involved dictionary use tasks, observation of use with primary and lower secondary students and trainee Warlpiri literacy workers, and comments from teachers and other adults. Reactions have been quite enthusiastic, and the dictionary appears to succeed in creating and maintaining interest, and in encouraging self-learning. We have received various suggestions on how to develop it further, which we hope to incorporate in future versions.

A picture of the system can be seen by clicking on the icon at right. More information about Kirrkirr, including further screen shots, is available from http://www.sultry.arts.usyd.edu.au/kirrkirr/.

References


Linguistic Exploration Workshop