EDBL: a general lexical basis for the automatic processing of Basque I. Aldezabal, O. Ansa, B. Arrieta, X. Artola, A. Ezeiza, G. Hernández, M. Lersundi Dept. of Computer Languages and Systems University of The Basque Country 649 p.k., 20080 Donostia (Basque Country) jiparzux@si.ehu.es EDBL (Euskararen Datu-Base Lexikala) is a general-purpose lexical database used in Basque text-processing tasks. It is a large repository of lexical knowledge (it has currently more than 75,000 entries) that acts as basis and support in a number of different NLP tasks, thus providing lexical information for several language analysis tools: morphological analysis, spell checking and correction, lemmatization and tagging, syntactic analysis, and so on. The lexical database has been designed to be neutral in relation to the different linguistic formalisms, and flexible and open enough to accept new types of information. A recently implemented browser-based user interface makes easy to the lexicographer the job of consulting the database, correcting and updating entries, adding new ones, etc. The paper presents the conceptual schema and the main features of this database, discussing some problems encountered in its design and implementation in a commercial DBMS. Extended Entity Relationship diagrams are used to explain the conceptual schema of the database. Given the diversity of the lexical entities and the complex relationships existing among them, three total specializations have been defined under the main class of the hierarchy. The first one divides all the entries in EDBL into standard and non-standard entries: being Basque a language still in course of standardization, processes such as spelling checking and correction, non-standard language analysis, etc. require information about non-standard entries and their relationships that must be stored in the lexical database. The second one divides the units in the database into dictionary entries (classified into the different parts-of-speech) and other entries (mainly non-independent morphemes and irregularly inflected forms). Finally, another total specialization has been established between single-word entries and multiword lexical units; this permits to describe the morphotactics of single-word entries, and the constitution and surface realization schemas of multiword lexical units. A hierarchy of typed feature structures (FS) has been designed to map the entities and relationships in the database conceptual schema. The FSs are coded in TEI-conformant SGML, and Feature Structure Declarations (FSD) have been made for all the types of the hierarchy. Feature inheritance is employed in these FSDs in order to make the definition of each type more consistent and comfortable. The FSs are used as a delivery format to export the lexical information from the database. The information coded in this way is subsequently used as input by the different language analysis tools.