The Use of a Relational Database in the Development and Maintenance of Linguistic Resources for Statistical Japanese Morphological Analysis Masayuki Asahara, Ryuichi Yoneda and Yuji Matsumoto Nara Institute of Science and Technology, Japan In this paper, we present a use of a relational database for developing and maintaining linguistic resources to improve the statistical models for a Japanese morphological analyzer. We maintain two kinds of data in the database: tagged texts and lexicons. Like in many previous POS (Part-of-Speech) taggers and morphological analyzers, the tagged texts are used to estimate n-gram statistics. The lexicons provide not only grammatical information but also the length and construction of the words. In languages that do not provide word segmentation in texts (e.g. Chinese and Japanese), consistent definition of words in the lexicon is critical to make morphological analyzers accurate. However, there is no agreement on the definition of word delimitation in this type of languages because of various types of prefixes and suffixes and compound constructions. Consequently, it is often necessary to transfer data from one definition to another. Hence, when we change the definition of word delimitation in the lexicon, we need to modify the tagged corpora to make them consistent with the lexicon. We propose a use of relational database schema to perform these modifications in tandem. In the Japanese language, there are several standards for word delimitation definition. To accommodate more than one definition of word delimitation, we compose a lexicon in the database. We define relations between the lexicons as follows. The lexicon with the most fine-grained word delimitation definition is taken as the base lexicon and the other lexicons are derived as lists of compounds of the base words. For this purpose, we define rules that concatenate items from the base lexicon in order to generate items for each of the derived lexicons. Using these methods and schemata in the database, we can maintain tagged texts and the several lexicons. The statistical models can be estimated as queries to the database. As a result, our Japanese morphological analyzer achieves the consistency in the word definition. During the talk, demonstrations of the database and of the morphological analyzer will be presented.