A Preliminary Study of the Structure of Lexicon Entries

John Bell and Steven Bird
University of Pennsylvania
jmbell@ling.upenn.edu, sb@ldc.upenn.edu

Paper presented at the workshop on
Web-Based Language Documentation and Description
12-15 December 2000, Philadelphia, USA.


Abstract. Perhaps the most widespread type of formal language resource is the lexicon. Languages with a long written tradition have numerous, richly structured dictionaries. For newly described languages, a lexicon is often the first published language resource. Detailed examination of lexical entries, especially across lexicons, reveals great diversity in the organization and representation of information. As we look to a future in which lexical data is increasingly deployed online, this diversity presents problems for exchanging data and for developing general purpose tools. The extensive work that has been done on markup for lexical entries, by means of both community-wide initiatives and the development of individual systems used by dictionary publishers, provides ways of associating semantic tags with parts of a lexical entry. However, that work does not address the core problem: there is no general purpose data model for lexical entries. This paper reports on some tentative steps we have taken towards the development of such a model. First, we have collected a sample of entries from fifty-five lexicons and analyzed their structure, abstracting away from details of presentation and layout. Second, we have developed a general model, and shown how the individual lexical entries can be expressed. Third, we have expressed the model as an XML DTD, and the sample entries as XML files. This paper reports on the methodology and the results of this research. We also discuss the issues that arose while we tried to balance the coverage and complexity of the model, and the shortcomings of our current solution. Finally, we list some desiderata for any general purpose data model for lexical entries coming out of this research.


Contents

1. Introduction
2. Methodology
3. A General Model
    Theoretical Aspects
    Details of Data
4. Issues and Problems
    Recursion
    Comment vs. Body
    The Head-Body Intersection
5. Desiderata
6. Conclusion
Acknowledgements
Footnotes
Appendix

1. Introduction

To illustrate the diversity which may occur in lexicons and our approach to it, consider the following three entries. In the first case (Javanese), sense definitions are grouped into a single entry, while in the second case (Orokolo), the entry is split into three, one for each sense. In the third case (Urdu), there are two entries but one sense.
Javanese [ full page ]
Two sense definitions are grouped into a single entry.
Orokolo [ full page ]
One entry for each sense definition.
Urdu [ full page ]
Two entries with a single sense definition.

In terms of our data modelling work, this choice between clumping and splitting lexical entries will be treated as a question of rendering, not of underlying structure.

Another area of diversity is in linear ordering. In the following examples, pronunciation information is ordered either before or after the sense definition.
French [ full page ]
Pronunciation information precedes sense definition.
Tsimshian [ full page ]
Sense definition precedes pronunciation.

As before, we abstract away from this rendering difference, and focus on the presumed underlying structure.

We have just seen two examples where hierarchical grouping and linear ordering will be ignored by the data model. This does not mean that the distinctions are unimportant, but only that they play no semantic role. Of course, there are aspects of hierarchical and sequential structure that are still significant. Consider the following example, from a Waskia dictionary.

Waskia [ full page ]

We would obviously want to assume that the hierarchical structure of the second entry is significant, as is the linear order of the two sense definitions for karar-. The sequence of the three entries is probably also significant, being ordered by some frequency or salience judgement.

Therefore, it is clear that some aspects of the visual form of a lexical entry are significant in informing us of the underlying structure, while others are less significant or irrelevant (e.g. the choice of typeface). In the present study, we will focus on identifying a common underlying structure, abstracting away from these details of presentation.

2. Methodology

Our source of data was information from approximately 55 dictionaries and lexicons (see Appendix). We took one or two pages from each. The selection of lexicons was based on several criteria:

This method is in some sense quite informal; but as the research necessitated looking through printed dictionaries for unique features of format and organization, no more thorough method which would not have taken a great deal more time and care presented itself.

From the dictionaries we chose following the above method, we copied from one to four pages (usually one or two; occasionally more if a single entry extended over several pages). Now, it is clear that if extensive examination had been made of each dictionary, the number needed to provide at least one token of the various features we discovered could have been significantly reduced. Since, however, our goal was inclusiveness, as long as the same number of features were found, the number of dictionaries used to find them was more or less irrelevant. A matter of more concern is that we may have left out certain features of importance. We hope that following the first two criteria listed above served to minimize this problem.

We then wrote context-free grammars (hereafter CFGs) for approximately 50 of the lexicon samples. The remainder were left out either because all features in the sample had already been included in another grammar, or because they were found to be of a type of lexicon for which this model is not attempting to account (e.g. comparative wordlists). After the CFGs had been written, they were combined into a single general grammar. The general grammar covered the large majority of cases, although certain difficult cases were not included. Thus, the model we offer was based on working out the necessary structures according to data from numerous dictionaries; as the generalized grammar was written, the various context-free grammars were rewritten a number of times in a fairly simple course of hypothesis refinement.

3. A General Model

3.1. Theoretical Aspects

Lexical entries typically comprise an orthographic ``headword'' by which the lexicon is sorted, and pronunciation information, morphosyntactic information (MSI) and sense definitions. Entries may be enriched with a great variety of other information: related forms, affixes, variants, notes, cross references, abbreviations, glossed examples, etymologies and so forth. Within a single lexicon, there may be considerable variation in the nature and quantity of information supplied for entries. Looking within and across lexicons, we find that entries almost always supply either a sense-definition or a cross-reference to another entry with a sense-definition, as in the following Russian example:

Russian [ NOT YET POSTED ]

Slightly less frequently, MSI is provided, and somewhat less frequently than that, pronunciation information is provided, although in lexicons derived directly from field data, not infrequently the headword itself provides pronunciation information, as in the Gangulida lexicon:

Gangulida [ full page]

Across lexicons, most entries contain all three forms of information.

In searching for a common structure involving these three constituents, one is immediately confronted by the variation in what counts as a lexical entry. In the Orokolo example above, three words having the same orthographic form and part of speech are represented as three independent entries. Contrast with the Orokolo example, the following example from Sango, which combines multiple sense definitions into a single entry.

Sango [ full page ]

It appears that the criterion for setting up independent lexical entries is semantic for the Orokolo lexicon, but phonological for the Sango lexicon. On closer inspection, we see that the Orokolo entries are not not completely independent, for the orthographic variant and the pronunciation for all three entries is only given in the first one. Thus it might be more accurate to view the three entries as a realization of a single underlying lexeme. If we do this, the dichotomy is eliminated, and we can begin to look for significant structural similarities across superficially divergent lexical entries. This raises an interesting question: can we represent enough structure in a single lexeme so that the full range of realizations into one or more entries is possible? We believe this possibility is worth exploring, and we build our model on this assumption to see how far it can take us.

In general then, we take a lexeme to consist of three sets: one enumerating the pronunciations, one containing MSI specifications, and one containing sense definitions. Each of these sets is flat, and combines information which may be spread across multiple entries of the same orthographic headword. Structure is lost when entries are combined like this: the fact that a particular sense corresponds to a particular part of speech, for example. To represent this information, we add a ternary relation over the sets. The following diagrams illustrate the model for our Sango and Orokolo examples:

The common structure is now immediately apparent. In both cases, the pronunciation set contains a single entry (here represented as an orthographic form), as does the MSI. In the Sango example, there are links to two sense definitions, while in the Orokolo example there are three. Assuming that elements of the three sets are identified P1, P2, ..., M1, M2, ..., S1, S2, ..., then the ternary relation specified for Sango would be: {<P1,M1,S1>, <P1,M1,S2>}.

The style of a particular lexicon will specify which of these three sets determines the division into separate entries. For Sango it will be the pronunciation set, while for Orokolo it will be the sense set. Given this data structure and this style specification, it ought to be straightforward to render entries in the appropriate format. The approach will be applied to more complex cases below.

In the DTD, these sets and the mapping are represented as follows. The Body of a lexeme contains zero or more Pron elements, zero or more MSI elements, and one or more Sense elements, plus Mapping elements which represent the correspondences.

<!ELEMENT Body ( Pron*, MSI*, Sense+, Mapping* )>

<!ELEMENT Mapping EMPTY>
<!ATTLIST Mapping
    pron        IDREF             #IMPLIED
    msi         IDREF             #IMPLIED
    sense       IDREF             #IMPLIED
>

(Note that the full DTD actually adds a fourth set Aux. This will be explained later. It has been omitted here since the case for it is not as clear.)

The full DTD and XML files are available in the xml folder which accompanies this paper [xml.zip].

3.2. Details of Data

In this section we describe the provenance and some of the characteristics of our general model. After having written the context-free grammars based on the various pages we used as samples, we combined them, attempting to preserve maximum possibilities of variation and to minimize inconsistencies. The amount of conflict between these two goals is unfortunately gereater than we would like, so that occasionally our model sacrifices the possibility of creating some part of a format such that it is identical to something in our data set. For instance, in our example from the Philippines language Batad Ifugao:

Batad Ifugao [ NOT YET POSTED ]

Note that the last comment mentions several other words found in the dictionary. Clearly it would be preferable to be able to include cross-references within comments; but we judge the complexity to be, at least for this model, unacceptably great. On the other hand, we have occasionally sacrificed what appeared to us a much more elegant structure in order to make our model more inclusive.(1) In any case, the process of converting the CFG to XML and of elaborating the general model occurred simultaneously. Thus, the model was built from the bottom up.

It became clear at an early stage that there were certain characteristics of the data which we wanted the model to represent formally. One of these characteristics is the amount of structural complexity possible in a given entry. We found 3 kinds of entries:

Entries of the first type have no structure in the body deeper than the first level. (A subelement of Lexeme called SimpleBody allows this. Entries of the second (and third) type may have complicated internal structures and the information which they contain may be broken up in several ways.

Entries consist of two general parts, a head and a body. Distinguishing which information may occur in which area turns out to be rather difficult.(3) In essence, the head element is the index by which the dictionary or lexicon is ordered. It may or may not include certain elements beyond the headword. Some elements, such as affixal information (either that there is a possibility of affixes, as indicated by the hyphens in Atayal -m-, or that certain affixes are included in the entry, as in Kikuyu mwigio)(4), seem to clearly belong to the head. Other elements which may appear in the head also occur in the body element in different entries. This is particularly true of pronunciation information. We examine the entry for the word tuntomik from Kröger's Buli dictionary:

Buli [ full page ]

The IPA spelling of the headword and the tone structure is given after the headword, and then a variant is listed, also with IPA and tone information.

The body has at least one subpart and as many as five.(5)

  1. There may be pronunciation information (under Pron), including such things as IPA transcription and/or different methods of transcription, tone information, abbreviations, phonology of affixes, and so forth. It also includes an ID tag.
  2. There may be morphosyntactic information (MSI) included, ranging from a single declined form of the headword to complete paradigms (see the Navajo word taal for an example of the latter in terms of aspect and mood):

    Navajo [ full page ]

    (We ought to note that this table carries pronunciation information as well: variation in vowel length is represented, as is the quality of the l.) Because such distinctions as number, gender, mood, etc. are easily enumerated, they can also be made easily into features attached to parts of speech. The parts of speech are listed under POS, a required subelement of MSI. Only nouns, verbs, affixes, and ideophones (cf. the peculiar entry for Shona -pwirika) seem to have features of their own; the other parts of speech, when they appear in entries, appear without further description. It is possible for there will be multiple MSI listings in a single entry: see, for instance, the Buli entry tulimbaziik, in which `def. tulimbaziika, pl. tulimbaziisa' is modelled by:

    <MSI id="M2">
      <POS type="noun">
        <OrthographicForm>tulimbaziika</OrthographicForm>
          <Feature  name="definiteness">definite</Feature>
        </POS>
      </MSI>
      <MSI id="M3">
        <POS type="noun">
    	  <OrthographicForm>tulimbaziisa</OrthographicForm>
    	  <Feature name="number">pl</Feature>
        </POS>
      </MSI>
    
  3. The Sense element may include an attribute Realm (indicating the general area of discourse to which the headword belongs, such as botany, politics, etc.), glosses, example phrases or sentences, miscellaneous information (such as a note, as in Tsimshian náhoon), and/or compounds. All subfields are optional, including Gloss. Having Gloss be non-required may seem counterintuitive; but in certain cases, there seems to be nothing there but a comment and an example under Sense (as in our Atayal example above). There may of course be multiple glosses and multiple examples.(6) Our model will be integrated with a model of interlinear text being developed by Kazuaki Maeda and Steven Bird, permitting multiple levels between the orthographic representation of a sentence and its phrasal translation.
  4. Aux contains the various types of miscellaneous information which may be included in an entry. This includes such things as etymology, obsolescence, cross-references, register, informant identity, and so forth. Some, such as obsolete, are marked by a binary attribute; others, such as Etymology, will need to allow prose within, and hence are sub-elements of Aux. The model allows more than one type of cross-reference. We call CrossRef those references which are to another entry in the dictionary, i.e. ``document-internal'' references. Such a reference may be either prose or an MSI entry, allowing reference to the tags in MSI (so that the cross-reference may be `past participle of X', for instance). We use XRef for those references which are to external sources, such as grammars, other dictionaries, literary works, etc. Register may have a type attribute, so that multiple types of register such as gender, social, and so on, can be distinguished.
  5. The other possible element of Body is Mapping. Its purpose is to provide the model with information so that the lexicon can be laid out in various manners while maintaining the proper relationships among data elements. If it is not included, only one entry is created for the lexeme, with elements in the order and format dictated by the default parameters of the stylesheet. If a mapping field is included, then the lexeme elements are collected and ordered accordingly. When there is no field for a particular element, a null element is inserted. The mapping is (Pron,MSI,Sense,Aux), with this form listed as many times as there are separate elements numbered. Thus it is quite possible for only one field to vary.

4. Issues/Problems

4.1. Recursion

It seems intuitively clear that a complete model of dictionaries and lexicons should not need to include recursion of entries. That is, while sub-entries certainly occur, a survey of some 75 dictionaries and lexicons showed no evidence of sub-sub-entries.(7) At worst, we have such entries as in the Quechua dictionary, where manca-pantalun is a sub-entry of pantalun with exactly the same format as its parent. Thus, if we were attempting to give a complete description of how dictionaries are written according to what we have seen, we might have these ``rewrite rules'':

Lexeme -> Head ( Body | SimpleBody )
Lexeme -> Head Body Head Body.(8)

Our goal, however, is to create a useful and efficient archiving tool. Thus, we assume that ``abuse'' of the recursive ability of the tool will not occur in the creation of normal lexicons, because it would make the data less readable, i.e. we assume that the person archiving the data is rational; we also assume that it might be useful under certain circumstances to be able to create heavily recursive structures (for instance, in the course of analyzing a polysynthetic or heavily agglutinative language). Thus, we do not formally block recursion, but we assume it will be finite and ``reasonable'', because the users are interested in making it so.

4.2 Comment or Body?

The second Arabic-English dictionary in our bestiary, the extremely erudite Lane dictionary, exemplifies another issue which we have not attempted to resolve. We have considered the entire entry of `eat'(9) to be in prose, and hence rendered the body as a comment.(10) A close study will show that much of it could be mapped into MSI, Sense, etc., with a much smaller percentage of the whole string being included as comment. This would involve, however, turning such items as `and', `with', `as above', and so forth, into things generated by the stylesheet. The level of complexity required of the stylesheet in order to create the kind of structure we see in this entry, however, would be, in a word, extreme as well as artificial. We do expect entries in the general format of the Lane dictionary to occur in some circumstances; for instance, a lexicon compiler in the first stages of studying a language might find it useful to make entries in an undifferentiated fashion, with the notion that the data could be sorted out and corrected later. We expect such to be the general usage of entries in the `SimpleBody -> Comment' format.

These issues relating to format of the entries can be generalized, however, to the question: how much freedom in terms of ordering and formatting must the model allow? In the above case, we have decided that the structure is too complex to be replicated, and hence the entry is treated as a nondivisible chunk of data. But before we generalize, let examine another general case in which this issue is prominent.

4.3. The Head-Body Intersection

We not infrequently see cases in which at least some information which would seem to belong the Body element or one of its children associated with the Head element. We have for example the Balinese word cangkem. In the Balinese dictionary, the difference between Head and Body seems to be indicated by a colon. A marker for high or low usage may optionally appear before the colon. (We should add that the marker for intransitivity may also appear before the colon, but it also appears after (cf. canggih, capala).)
Balinese [ full page ]


We take the liberty of assuming that as morpho-syntactic information, the intransitive marker ought to appear in the body. Also, as we discussed above in 3.2, there are a number of entries in our dictionaries in which material we would expect to find in the body of entries occurs before material we expect to find in the head (that is, the entry has the order head-material body-material head-material body-material.) While ordering within an element is nonproblematic, there is no way of mixing the order of materials internal to two different elements. So, in the Urdu dictionary, the etymological information is listed before the headword.

Urdu [ full page ]

In order to minimize the complexity of the Head element, we want to have as much information as possible in Body; but we also cannot have the Body element split by the Head element. Thus, we have a field in the head which allows some small note(s) to be made, to cover such cases as the Urdu dictionary's.

The question we posed in 4.2, regarding how much freedom in terms of ordering and formatting ought the model to allow, will in the end be answered in the field, via the actions of the people using the archiving tool. There will always be a certain tension between what is most convenient for the creator of an archival document and what is most convenient for the reader and later users (à la Zipf's economies). We discuss the balance of these two issues below.

5. Desiderata

We believe that our data provides a good argument for the use of a model not completely dissimilar to ours, capable of freedom in representation combined with relatively few structures, is necessary to provide a level of variety in possible representation of entries approaching that found in dictionaries and lexicons. We believe that a model must

  1. be realizable in a variety of display formats, depending on the requirements of the field linguist, analyst, speaker, etc,
  2. have an underlying structure which captures as many generalizations as possible
  3. be expressively adequate.

Issue #1 is the most straightforward to deal with; we rely on stylesheets, which should be easily generable from our model. The fact that #2 and #3 come into conflict is at the root of the compromises visible in our model; for instance, it would clearly be simpler to have the Aux element exist at only one level. But because comments, cross-references, and other things can occur at more than one level, and because it may be necessary to map different Aux elements to different entries, it is necessary to have Aux be both a sibling element and a daughter element of Sense Similarly, our acceptance of recursion in the model increases its expressive adequacy but ignores the generalization that multiple recursive levels do not seem to recur. We also abandoned attempting to note every generalization in such cases as the Lane Arabic dictionary, but have allowed the existence of entries which are nothing but comments; thus our model can be used as a lexical notebook. On the other hand, we have tried to put as many tags as are practicable into our model, and the open-ended nature of XML allows for expansion in any of those areas. We believe that this could be very useful in maximizing the number of generalizations derivable from data. In short, we have tried to strike as good as balance as possible.

6. Conclusion

The data model we discuss in this paper and have available on our webpage is data-driven, built on the examination of a number of actual lexicons and dictionaries, many of them de facto records of fieldwork. We have sought to allow in the general model we have created from this data, as much freedom as a language researcher is likely to want or need, as well as the ability to manage generalizations. We also recognize that this model could no doubt be significantly improved by a closer study, both of other dictionaries and of those we have already used. We hope that this collection of data and the model we have developed will contribute to the discussion of a common scheme for storing lexical data.

Acknowledgements

We wish to thank Haejoong Lee for his work on the XML examples, and Gary Simons for helpful discussions of lexicon models. This research was funded by an NSF grant ISLE: International Standards in Language Engineering.


Footnotes

      1 The most dramatic example of this is probably our ``garbage'' category, Aux. Because such things as comments, cross-references, and so forth need to occur at more than one level within the model (i.e. at Sense level), Aux must be available at this level as well. But we also want to be able to index the top-level divisions under BODY in order to present entries in different manners. Thus there may need to be two forms of Aux, one with an index and the other without.
      2 We debated about whether recursion was necessary or not at great length. We tentatively conclude that it is not necessary for a complete description, but that it is useful in our model. This will be discussed further below.
      3 This is of course due to the fact that the assembler of the dictionary is not solely thinking in terms of a single format but wants to spotlight certain elements of entries; one easy way to do this is to put it near the beginning of an entry. Since it is arbitrary, this behaviour is difficult to model.
      4 Obviously it is possible for these dashes to indicate that an item is a root or an affix, depending on the language and the intention of the author. We feel that it is best not to attempt to formally represent root- or affixhood by means of this, but rather to include it in the morphosyntactic information. This preserves the ``openness'' of real dictionaries.
      5 We very recently discovered that the Longmans Dictionary of Contemporary English is in XML format, but not recently enough to thoroughly analyze its similarity to and differences from our own model. Briefly, we note that the primary division seems to be between Head and Sense; pronunciation information and morphosyntactic information is found in the head, although the dialectal information is found under Sense. The definition occurs under Sense, of course; we assume that an example sentence would be, as well. We believe that this structure would be suboptimal for fieldwork because of the assumption of the primacy of sense; it would disallow ordering entries by syntax, for example.
      6 We also note that in monolingual dictionaries, example sentences generally do not have glosses whereas compounds do, for example in the Spanish dictionary. In multilingual dictionaries, both have glosses.
      7 Even such words as the English antidisestablishmentarianism would require no more than 6 sub-entries:

establish:  to set up . . . 
    establishment:  something which has been set up . . . 
        disestablishment:  . . . 
            disestablishmentarian:  . . . 
                disestablishmentarianism:  . . . 
                    antidisestablishmentarianism:  . . . .  

The above is obviously an inefficient format, in that each subentry gives a minimal amount of information and also implies a lack of generality in the meaning of affixes. We hope that there will never be a language archiver so perverse as to use the method exemplified above.
      8 The style sheet would appropriately indent the second Head, etc.
      9 This version may be difficult to read; interested readers may get the entire page in an extremely large format. (Please note, however, that this .gif file is 268K in size and hence may load rather slowly).}
      10 That is, the grammar describing this entry is simply:

Lexeme	   ->  Head SimpleBody 
Head       ->  Headword 
Headword   ->  OrthographicForm 
SimpleBody ->  Comment


Appendix

By alphabetical order of language:

Ambrym-English:
Parker, G.J. Southeast Ambrym dictionary.
Canberra: The Australian National University, 1970. pp. 5, 11.

Arabic-French-English:
Blachère, Règis. Dictionnaire Arabe-Francais-Anglais.
Paris: G. P. Maisonneuve, 1964. pp. 14-16, 149.

Arabic-English:
Lane, E.W. An Arabic-English Lexicon, bk. I.
London, Edinburgh: Williams and Norgate, 1863-93. pp. 1853.

Atayal-English:
Egerod, S. Atayal-English Dictionary, v. 1.
London: Curzon Press, 1980. pp. 289, 310, 354.

Balinese-English:
Barber, C.C. A Balinese-English Dictionary.
Aberdeen: University of Aberdeen, 1979. p. 69.

Batad Ifugao-English:
Newell, L.E. Batad Ifugao Dictionary.
Manila: Linguistic Society of the Philippines, 1993. pp. 312, 325, 525.

Buli-English:
Kröger, F. Buli-English dictionary: With an Introductory Grammar and an Index.
Münster: Lit, 1992. p. 363.

Chickasaw-English:
Munro, P. & Willmond, C. Chickasaw Analytical Dictionary.
Norman: University of Oklahoma Press, 1994. pp. 211, 218, 240.

Chinese-English:
Lin, Y. Chinese-English Dictionary of Modern Usage.
Hong Kong: Chinese University of Hong Kong, 1972 (distributed by McGraw-Hill, New York). pp. 533-36.

Mathews, R. H. A Chinese-English Dictionary.
Shanghai: Chinese Inland Mission and Presbyterian Mission Press, 1931. pp. 358-59.

Comanche-English:
Robinson, L.W. Comanche Dictionary and Grammar.
Dallas: Summer Institute of Linguistics; Arlington: University of Texas at Arlington, 1990. p. 77.

English:
Ed: Simpson, J.A., & Weiner, E.S.C. The Oxford English Dictionary, 2nd ed.
New York: Oxford University Press, 1989. p. 767.

English-Arabic:
Badger, G.P. An English-Arabic Lexicon.
Beirut: Librairie du Liban, 1967. p. 533.

English-Japanese:
blahblah o in Shogakukan should have macron Shogakukan Randamu Hausu Ei-Wa Daijiten. v. IV (S-Z).
Japan: Shogakukan & Random House, 1974. p. 468.

English-Korean:
Ed. Kwang, M.K. et al. A New English-Korean Dictionary.
Seoul: Omungak Pub. Co., 1964. p. 1551.

English-Mbukushu:
Wynne, R. C. English-Mbukushu Dictionary.
Amersham, England: Avebury, 1980. p. 393.

English-Russian:
United States. War Dept. Dictionary of Spoken Russian: English-Russian, Russian-English.
Washington D.C.: War Department, 1945. p. 183.

English-Silozi:
O'Sullivan, O. English-Silozi Dictionary.
Lusaka: Zambia Educational Pub. House, 1993. pp. 140-41.

Fox-English:
Bloomfield, L. Leonard Bloomfield's Fox Lexicon. Ed. Ives Goddard.
Winnipeg: Algonquian and Iroquoian Linguistics, 1994. p. 77.

French-English:
Collins-Robert French-English, English-French Dictionary
Cleveland: Collins, 1978. p. 257.

Gangulida-English:
Holmer, N.M. Notes on Some Queensland Languages.
Canberra: Australian National University, 1988. p. 99.

German-English:
Ed. Drosdowski, Günther. Duden, Das Grosse Wörterbuch der Deutschen Sprache, 2nd. ed.
Mannheim: Dudenverlag, 1993-1995. p. 3830.

Hausa-English:
Abraham, R.C. Dictionary of the Hausa Language.
London: University of London Press, 1962. pp. 449-454.

Hawaiian-English:
Pukui, M.K. Hawaiian Dictionary: Hawaiian-English, English-Hawaiian.
Honolulu: University of Hawaii Press, c1986. p. 139.

Iban-English:
Richards, A.J.N. An Iban-English Dictionary.
New York: Oxford University Press, 1981. pp. 5, 11, 23.

Japanese-English:
Ed. Collick, R.M.V., Kazuo, H., & Munekazu, T. BLAHBLAH u in chujiten needs macron as do o's in Tokyo and u in Kenkyusha Shin Wa-Ei Chujiten. Edition: Dai 4-han.
Tokyo: Kenkyusha, 1995. p. 747.

Javanese-English:
Horne, E.M.C. Javanese-English Dictionary.
New Haven: Yale University Press, 1974. p. 410.

Kikuyu-English:
Benson, T. G. Kikuyu-English Dictionary.
Oxford: Clarendon Press, 1964. pp. 197-201.

Kwaio-English:
Keesing, R.M. Kwaio Dictionary.
Canberra: Australian National University, 1975. p. 17.

Latvian-English:
Turkina, E. Latvian-English Dictionary.
New York: French & European Publications, Inc., no date. p. 193.

Mbum-English:
Hino, Shun'ya. The Classified Vocabulary of the Mbum Language in Mbang Mboum.
Tokyo: Institute for the Study of Languages and Cultures of Asia and Africa, 1978. pp. 158-59.

Mehri-English:
Johnstone, T. M. Mehri Lexicon and English-Mehri Word-List.
London: University of London, 1987. p. 59.

Mende-English:
Innes, G. A Mende-English Dictionary.
London: Cambridge University Press, 1969. pp. 82-83.

Navajo-English:
Young, R.W. Analytical Lexicon of Navajo.
Albuquerque: University of New Mexico Press, 1992. pp. 486-7.

Nengone-English:
Tryon, D. T. Nengone Dictionary.
Canberra: Australian National University, 1969. p. 135.

Orokolo-English:
Brown, H.A. A Comparative Dictionary of Orokolo, Gulf of Papua.
Canberra: Australian National University, 1986. p. 99.

Paiwan-English:
Ferrell, R. Paiwan Dictionary, Pacific Linguistics Series no. C-73.
Canberra: Australian National University, 1982. p. 63.

Palauan-English:
McManus, E.G. Palauan-English Dictionary. Ed. Josephs, L.S., Emesiochel, M.
Honolulu: University Press of Hawaii, 1976. pp. 173, 303.

Pawnee-English:
formerly: php.indiana.edu/~aisri/lab/pawnee/pawnee.html
currently: http://www.indiana.edu/~aisri/projects/;
Frame: Education Projects: Pawnee: Student Multimedia Dictionaries. last referenced 11/19/00.

Pero-English:
Frajzyngier, Z. A Pero-English and English-Pero Vocabulary.
Berlin: D. Riemer, 1985. pp. 50-51.

Persian-English:
Haim, S. Persian-English Dictionary.
New York: Hippocrene Books, 1993. no page numbers.

Polish-English:
Stonislawski, J. Wielki stownik polsko-angielski: The Great Polish-English Dictionary
Warsaw: Wiedza Powszechna, 1994. pp. 375-77.

Quechua-Spanish-English:
Ed: Weber, D.J. Rimaycuna Quechua de Huanaco.
Lima: Instituto de Verano, 1998. p. 393.

Russian-English:
Wheeler, M. The Oxford Russian-English Dictionary
Oxford: Clarendon Press, 1972. pp. 479.

Saami-Swedish-English:
Nielsen, K. Lappisk Ordbok.
Cambridge: Harvard University Press, 1932-1962. p. 407.

Sango-French:
Bouquiaux, L. BLAHBLAH o in second sango is open o Dictionnaire sango-français; Bàkàrí sango-fàrànzì Paris: Société d'études linguistiques et anthropologiques de France, 1978. p. 253.

Scottish-English:
Jamieson, J. An Etymological Dictionary of the Scottish Language.
New York: AMS Press, 1966. pp. 645-46.

Shona-English:
Hannan, M. Standard Shona Dictionary.
New York: St Martin's Press, 1959. pp. 520-23.

Spanish:
Diccionario de la lengua española, 19th Ed.
Madrid: Real Academia Española, 1970. p. 227.

Telugu-English:
Gwynn, J.P.L. A Telugu-English Dictionary.
New York: Oxford University Press, 1991. p. 35.

Tetun-English:
Morris, C. Tetun-English Dictionary.
Canberra: Australian National University, 1984. p. 27.

Tsimshian-English:
Dunn, J.A. Sm'algyax: A Reference Dictionary and Grammar for the Coast Tsimshian Language.
Seattle: University of Washington Press, 1995. p. 77.

Tuscarora-English:
Rudes, B.A. Tuscarora-English/English-Tuscarora Dictionary.
Toronto: University of Toronto Press, 1999. pp. 167, 201.

Urdu-English:
Shakespear, J. Dictionary, Urdu-English and English-Urdu
Lahore: Sang-e Meel Publications, 1980. pp. 253-4.

Vietnamese-English:
van Hung, L. Vietnamese-English Dictionary
Paris: éditions Europe-Asie, 1955. p. 315.

Waskia-English:
Ross, Malcolm. A Waskia Grammar Sketch and Vocabulary.
Canberra: Australian National University, 1978. p. 87.

Yagaria-English:
Renck, G.L. Yagaria Dictionary.
Canberra: Australian National University, 1977. p. 161.