IRCS Workshop on Linguistic Databases

11-13 December 2001
University of Pennsylvania, Philadelphia, USA

Organized by Steven Bird, Peter Buneman and Mark Liberman
Funded by the National Science Foundation


Workshop Overview

Linguistic databases are digital repositories of structured information intended to document natural language and natural communicative interaction. Over the last decade, linguistic databases have come to stand at the center of empirical research in the language sciences, and in the development of new human language technologies. Like genomic databases, linguistic databases are complex, evolving and richly annotated repositories, and pose interesting challenges for efficient representation, indexing and query. And like most scientific databases, linguistic databases have made little use of standard database technology.

The goals of the workshop are to take stock of existing research in linguistic databases, to identify the key problems, and to explore applications of current database research to these problems. More broadly, the workshop will help define the research questions of a new "linguistic database community" and initiate the ongoing interchange of relevant problems and results between this community and the database community at large.

The workshop is expected to attract participants from a range of specialties including databases, linguistics, computational linguistics, annotation and markup. There will be tutorial-style presentations on relevant models in each of these areas.

The workshop will address a selection of the following topics:

MODELS

LANGUAGES

OTHER TOPICS


Steven Bird, Peter Buneman, & Mark Liberman (LDC, CIS, & Linguistics)
Email: sb@ldc.upenn.edu, peter@cis.upenn.edu, myl@unagi.cis.upenn.edu