Creating a database and query-tools for a large, multi-speaker linguistic corpus Jeff Good and Ronald Sprouse The Turkish Electronic Living Lexicon (TELL) University of California, Berkeley The Turkish Electronic Living Lexicon (TELL) represents the first large-scale effort to collect recordings of a very large number of morphological paradigms (more than 17,000) from several speakers of Turkish. The nature of this data presents several challenges for the creation of the database itself and the query tools that access it. The primary dimensions along which the database is organized include (1) speaker, (2) paradigm, and (3) utterance. Each of these will be discussed in turn. Speaker: Specifying that an uttered paradigm is associated with a particular speaker is straightforward. However, to allow comparison between speakers, the database must also include a way of stating that paradigms from different speakers can be from the same abstract "word" since different speakers may have different phonological representations for a given word. Paradigm: A paradigm is a complex object that cannot simply be taken to be a set of utterances. Instead, it is a set utterances, all having the same part of speech and each of which is labelled for a particular morphological category. In addition, TELL associates orthographic, semantic, and etymological information with a paradigm as well as marking paradigms for a specific set of phonological irregularities which may be of interest to those researching Turkish phonology. Utterance: Utterances are also complex objects consisting of both a transcription and the recorded utterance itself. One of the goals of TELL is to make both the transcriptions and the digitized recordings available online. TELL has implemented an SQL relational database which solves the representational issues of the database. Using a system of lexical data tables, containing actual paradigms, and a system of pointers and relational tables, it is possible not only to represent complex internal relationships, like those in a paradigm, but also possible to link related paradigms from different speakers and to link a given paradigm to orthrographic and etymological information. An additional challenge for TELL is the design of an online user interface. Possible user-specified search variables include: speaker, phonological string, morphological form, etymology, and phonological irregularity. Since a detailed user interface would be unnecessarily complex for the occasional user of the database, we have implemented both simple and advanced interfaces. We expect the majority of casual users to prefer the simple search interface, while serious researchers will appreciate the enhancements of the advanced interface, which allows full access to linguistically-important variables, including phonemic representations, syllabic templates, speaker, and etymology.