PAX - an annotation based concordancing toolkit Dafydd Gibbon & Thorsten Trippel {gibbon,trippel}@spectrum.uni-bielefeld.de We demonstrate the PAX - "Portable Audio Concordance System" - proof-of-concept prototype of a multilingual audio concordance toolkit, which was designed to support the efficient documentation of endangered languages (here mainly the Ivory Coast languages Ega, Kwa, Koulango). The main requirements were (1) Standardisation (coding requirements should be standardised to ensure archive exchangeability and to avoid incompatibilities e.g. with proprietary fonts, (2) Interoperability (the implementation should be interoperable across the main platforms (Linux, Windows, Mac), and (3) Low-cost, low-end (it should not depend on recent software versions or high-end hardware in order to ensure usability under local conditions, and with offline capability to reduce networking expense as well as online capability). Full specification documents and a release version can be found at: http://www.spectrum.uni-bielefeld.de/LangDoc/EGA/ PAX concordance design is based on a function f: CORPUS -> , where CORPUS is a set of annotated signals partitioned for different languages, KWIC is the keyword in context concordance, and SIGTRANS is a set of representations of signal transformations (audio, waveform, F0, spectrogram). The corpus consists of digital signals, mainly of spoken narratives, which were annotated at word level. The user interface has three panels which are called sequentially: (1) corpus options, (2) word options, (3) KWIC output with options for hearing audio, and viewing waveform, F0, spectrogram. The input data are time-aligned in SAMPA standard ASCII IPA coding (slightly modified for the efficient coding of tone languages), using Praat, Transcriber or esps/waves+ software. These operational formats are translated (with Perl format converter modules) to an XML archive format using the TASX DTD (see Milde, this Workshop), retaining the SAMPA coding. TASX-XML and SAMPA were selected in order to fulful the archive exchange requirement. PAX concordance architecture is modular, with wordlist extraction and KWIC concordance construction modules (Perl), and signal extraction and processing modules (C and Praat scripts). These modules feed two independent GUI server modules, (1) a Perl CGI application with HTML and WAV output for online use with a web server, (2) a Perl/Tk application for offline use without a local CGI server. This hybrid implementation strategy was chosen in order to to fulfil the low-cost, low-end, on/offline and interoperability requirements. The PAX toolkit is currently being extended to time-aligned multimodal data. After the conclusion of the project, the tools will be made generally available as GPL (or similar) free software.