Recently there has been a remarkable convergence between contemporary issues for Australianist linguistics and advancements in information technology. Along with increased interest in language maintenance and/or revival, especially within indigenous communities, we now have opportunities provided by multimedia and Internet technologies. I will show prototype tools that facilitate the development of structured multimedia and hypertextual linguistic resources that are platform-independent and use a standard format (XML).
With colleagues, I have been researching and implementing tools to meet two major requirements for contemporary work with endangered languages: richer documentation of linguistic knowledge and events; and stronger support for producing materials for language maintenance in the communities.
In the past, our projects have focussed on modelling data and processes, and implementing them in structured formats such as databases [3][4]. Following the growth of electronic networks, the value of such data orientation has been highlighted, because it allows data to be restructured, repurposed, compared, or combined with other data. What has been missing, however, is:
In the presentation I will discuss new approaches in the context of software tools for presenting sound, text, lexicon and linguistic analysis. A platform for producing and presenting linked audio, video, text, and linguistic description, originally developed in collaboration with Dr Eva Csató at Tokyo University of Foreign Studies for the endangered Turkic language Karaim, provides a generalised template architecture and has proved adaptable for other languages of similar typologies [2]. Transparent import formats allow it to support collaborative, iterative development of resources. We have demonstrated the use of the system for several languages including Sasak and Yolngu-Matha.
Fig 1:Karaim platform, main screen features
This platform can be regarded as a multimedia browser for richly linked linguistic data. The next step is to build a complementary tool for transparently authoring this kind of material. This phase has begun by developing a system for annotating realtime media (sound, video). When completed, this system will allow sound and video to be linked to transcriptions, a lexicon, morphological analysis, or any other user-specified description, and then output to structured XML files. The new resources can then be archived, printed, viewed using a multimedia browser such as a web/XML browser or the Karaim platform, or published to allow restructure and re-use for different purposes by others. The collaborative process has been illustrated by accessing an XML-encoded dictionary via the Internet, extracting text and references from it, and then inserting them into interlinear annotations that combine the lexicographer's data, the recorded linguistic event data, and the linguistic description/analysis. The resultant annotations are exported (in an explicit XML format) to allow the linguistic description and analysis to refer to the original linguistic performance.
Fig 2:Annotation tool, accessing remote dictionary
It can be seen that there are methodological implications of such approaches, as they make linguistic description and analysis (cf [1]):