An Integrated Multimedia Dictionary and Text Processor
for the Documentation of Endangered Languages

Wallace Hooper, Indiana University

American Indian Studies Research Institute, Indiana University
422 North Indiana Avenue, Bloomington, IN 47408
whooper@indiana.edu | www.indiana.edu/~aisri


Since 1995, working groups at the American Indian Studies Research Institute have carried out an energetic program of computer-based linguistic research and software development. In this report, I'll describe software products designed to handle large collections of lexical and text data, with closely associated sound, video, and image resources, for the purposes of analysis and publication.

Our programming efforts have been organized around research projects led by Douglas R. Parks and Raymond J. DeMallie, and their colleagues at Indiana University, and elsewhere. The central projects were conducted between 1975 and the present, with speakers of Pawnee and Arikara, which are both Caddoan languages, and with speakers of Yanktonai, Sisseton, Teton, and Assiniboine, which are Siouan languages. This work is particularly urgent in the case of the Caddoan languages that are, quite frankly, on the verge of practical extinction. All of our software development has been driven and shaped in detail by the needs and experiences of these ongoing field work projects.

Parks and DeMallie worked with native speakers to elicit lexical data and narratives that they captured in sound recordings and manuscript notes and transcriptions. Those resources were further supplemented by historical dictionaries, grammars, and collections of texts. All of this material had to be organized and consistently catalogued for the purposes of analysis. The sheer size of the projects and the complexity of the problems they faced--familiar to most linguists--militated for the use, and development of appropriate machine tools. Thus an important thread throughout the report will be how our tools were designed to address and resolve the very real problems and tasks of data collection and analysis faced by Parks, DeMallie, colleagues, and students, in various aspects of their work.

This paper will chiefly describe our dictionary processor, IDD, (look under Research Projects > Dictionaries > IDD) which has been operational for use in projects at AISRI and elsewhere for two years. The Indiana Dictionary Database processor allows our linguists to organize and store massive amounts of lexical data in close association with sound and video field recordings, as well as relevant illustrations of cultural objects. We will take a look "under the hood." IDD is an extended relational database application developed in Visual FoxPro in the Windows environment for laptop and desktop PCs. As a research tool, IDD is designed to let the researcher pose open-ended, systematic, and complicated queries of their lexical resources, working across several basic linguistic categories. IDD is also designed to let them work in a fully bilingual context--allowing them to think and analyze their data from the perspectives of both languages, e.g., Pawnee-English and English-Pawnee--using tools that will do more than construct a simple English index. Yet IDD was basically designed to produce professional, camera-ready, and web-ready versions of bilingual dictionaries, and publishable reports of linguistic data.

A new annotated text processor, ATP, now under development since July, 1999, will be integrated with the dictionary processor and exploit its resources in the analysis and incorporation of text data. ATP is being developed in Borland C++ Builder and Visual FoxPro. It will allow the linguist to associate sound and video recordings directly with texts and their analyzed and annotated forms, and will be used for the preparation of camera-ready or web-ready documents in interlinear and facing-page formats. Together the IDD and ATP will provide an integrated working environment for the researcher.

This report also briefly mentions AISRI's language-instruction software and curriculum materials, now in use in school systems in North Dakota and Oklahoma, and our sound processing lab at the Center for the Documentation of Endangered Languages (CDEL). Our language instruction software is being developed in Macromedia's Authorware and Director, and Cold Fusion, following principles derived from the latest research in second language instruction, psycholinguistics, and instructional systems technology. At CDEL, we have created a state-of-the-art sound processing lab capable of digitizing, restoring, archiving, and producing CD-quality sound files from a diverse range of field recording technologies--our sound processing group works with reproductions of cylinder recordings, reel-to-reel tape, wire, cassette, DAT, and video recordings. CDEL and its growing archives have been a central resource for all of our projects, and it should prove invaluable long after we pass away, both to the communities, and to future linguists working with tools and theories that have yet to be imagined and articulated.


Linguistic Exploration Workshop