Towards computerized support for empirical linguistics:
Some ideas from computer science

Richmond H. Thomason, University of Michigan

Philosophy, Linguistics, and Computer Science, University of Michigan
2251 Angell Hall, Ann Arbor, MI 48109-1003, USA
rich@thomason.org | www.eecs.umich.edu/~rthomaso/


Science can hardly be done at all these days without computer support. It is hard to find an area of contemporary science that is not heavily dependent on some combination of numerical calculation, visualization and modeling techniques, and database services.

Linguistics is becoming a client of computation in the same general way as other sciences: probably the most significant impacts of computation on the core areas of linguistics are in signal processing and in the developing use of corpora and special-purpose databases. But linguistics is different from most other sciences in the potential impact of sophisticated symbolic computation on the way in which traditional areas of linguistics are carried out. Some of these techniques are made available by computational linguistics (which I am more or less artificially separating in this presentation from the "core areas"); the field of knowledge representation also has many useful ideas to contribute.

I believe that there is a significant and perhaps unique opportunity to use ideas from computer science to develop computational tools that will not only provide store data for field linguists, but that will partially automate the process of scientific discovery and validation of the sort of hypotheses that form the basis of scientific grammars. But to do this, we need to get away from the query routines that are associated with relational databases, and to acquire a sense of what sort of queries would be truly needed by field linguists.

Although the scale of a useful system of this kind would probably be much less than that of an ambitious software development project, I believe that in seeking to develop it we can profit from the lessons that have been learned by software engineers. In particular, the design of a successful study depends crucially on the needs of the user community, where here the user community consists of working field linguists. I would advocate a systematic study of the data collection, data entry, data validation and maintenance, processes, and of how the data is accessed in developing a grammar. The challenge of developing a truly effective computational partner for field linguists, I believe, is less a matter of software development than of gaining an accurate and relatively comprehensive picture of the general needs of field linguists. With the results of such a study, it may well be possible to produce a scientific support system that is unique in the level of support it provides to the creative work of a qualitative science.


Linguistic Exploration Workshop