Boas: A Field Linguist in a Box

Ron Zacharski, New Mexico State University

Computing Research Laboratory, New Mexico State University
Box 30001, Las Cruces, NM 88003, USA
raz@crl.nmsu.edu | crl.nmsu.edu/Staff.pages/Technical/raz.htm


Boas is a web-based knowledge acquisition system that guides a user through the process of acquiring knowledge about a user-specified language. It is part of a larger system designed to assist a team of two people---neither a linguist or a computational linguist---in developing a broad-coverage machine translation system in 6 months. The Boas system guides a user through a set of knowledge acquisition 'threads'. For example, one thread asks about what parts of speech inflect in the language under investigation. The major components of Boas are knowledge acquisition modules for: morphology, syntax, productive affixes, closed-class items, and open-class items. Although the design of each module is geared to the specific task at hand, they all share the design goal of leading a non-linguist through the acquisition process.

For example, the acquisition of morphology starts with guiding the user through a series of questions to elicit information about the parameters and values required for inflectional morphology. For nouns, the user is asked whether nouns in their language inflect for case, number, person, etc. In addition, they are asked whether nouns have inherent features (for example, gender). These linguistic terms are explained on html pages associated with the acquisition process, and in an additional web-based help system. After the acquisition of parameters and values, the user is guided through a process to determine the inflectional paradigms for a particular part of speech. In the final phase the system attempts to learn the morphological rules of the language by asking the user to provide some examples of the inflected forms in a paradigm and to correct forms generated by the system.

Implementation Details and Availability

The Boas system is implemented as a Java servlet. The current implementation uses a Unix box as the server side machine. We are in the process of porting the server to Windows NT. Although, Boas runs on a number of client architectures and browsers, it is optimized for Internet Explorer running on Windows NT. This is our target client because it has reasonably good Unicode input methods. Boas requires several applications be installed on the server: Java JDK (1.1.7 or higher) (www.javasoft.com), the Apache httpd web server (www.apache.org), and the PostgreSQL database server (www.postgresql.org). The current implementation also requires the Xerox finite state toolkit (www.xrce.xerox.com/research/mltt/fst/home.html), although we are exploring alternatives to the Xerox tools (particularly, the van Noord FSA utilities). The Boas system will be made freely available in the 1st quarter of 2000.


Linguistic Exploration Workshop