An Architecture for a Federation of Highly Heterogeneous Lexical Information Sources Artola, X. Dept. of Computer Languages and Systems University of The Basque Country 649 p.k., 20080 Donostia (Basque Country) jiparzux@si.ehu.es Soroa, A. Dept. of Comp. Science and Artificial Intelligence University of The Basque Country 649 p.k., 20080 Donostia (Basque Country) ccpsoeta@si.ehu.es The purpose of this paper is to present an integrating architecture for a federation of highly heterogeneous lexical information sources. The term lexical information source is used here in its broadest sense, and so it includes from very different lexical and dictionary databases, up to heterogeneously structured and coded electronic dictionaries or language-processing lexica, or even language-processing programs such as lemmatisers or part-of-speech taggers. Our proposal is that lexical and dictionary stores can be brought together in an integrated federation of databases (used also in its broadest sense), so that we can query them by using a unique language, regardless of the particular resource to which the query will be finally targeted. As desirable as it might be, it is unrealistic to expect that the great variety of available lexical information resources could be converted into a single and standard representation schema in the near future. Hence, our approach aims for a solution that does not require any conversion of the sources, and that participates of the emerging "wrapper" technologies that enhance external communication with legacy systems. We address the problem of querying very different sources of lexical information using for this purpose a unique and common query language from the point of view of information integration. The so-called local-as-view paradigm is used for describing each lexical source as a view over a general conceptual model (GCM). So, it is necessary to specify a GCM to describe lexical knowledge, a source conceptual model (SCM) for each lexical source expressed much in the way of the GCM, as well as the manner to describe the content of each source in terms of the classes and relationships of the GCM source content descriptions. The GCM and some of the SCMs have been specified and implemented using a description logic language NeoClassic, and an answering algorithm that translates queries from the general model into each particular schema has also been implemented. Currently two lexical resources have been integrated in our system. One is a TEI-conformant general purpose Basque monolingual dictionary, locally stored as a collection of SGML documents. The second is a pure lexical database called EDBL, used in language-processing tasks and stored in a classic relational DBMS.