ISLE: International Standards in Language Engineering
Spoken Language Group

The purpose of the ISLE spoken language group is to foster the development of new standards and tools for spoken language data.

Funding

The ISLE project is funded for two years by the NSF and the EC, as part of a joint program Multilingual Information Access and Management. This program is `intended to further the knowledge required to build information systems that operate in multiple languages; to provide the technologies required for their application in a number of social and organizational contexts; and to demonstrate the validity of the approaches chosen.'

The call describes a research infrastructure agenda with the subheadings: `standards for linguistic data', `multilingual ontologies' and `linguistic data centers'. The first of these is described as follows:

Standards for Linguistic Data: These activities include the definition of coding and interchange standards for multilingual spoken and written language data, with the associated tools and methods, in support of activities within the first theme above.

Structure

The European side of the partnership is an outgrowth of EAGLES, and they recently organized a workshop Meta-Descriptions and Annotation Schemas for Multimodal/Multimedia Language Resources.

US-ISLE consists of three groups, one having three subgroups. The groups and coordinators are as follows:

Spoken Language Group

Annotated speech corpora have been a critical component of research in the speech sciences for some years. As such corpora have proliferated, and have found uses across a rapidly expanding set of languages, disciplines and technologies, the lack of agreed standards has become a critical problem. Of course, the standardization of tagsets is necessarily an open-ended task, and is always subject to revision as the underlying domains change and the theories evolve.

Yet the standardization of the annotation structures themselves is a goal that could be substantially achieved in a 3-5 year timeframe. This latter issue is currently the primary roadblock for the creation of general-purpose tools and formats. A widely adopted annotation standard - building on earlier work reported in the Handbook of Standards and Resources for Spoken Language Systems - would be an important milestone for spoken language research infrastructure.

The members of the US-ISLE Spoken Language Group are: Steven Bird (LDC), David Day (MITRE), John Garofolo (NIST), Nancy Ide (Vassar College), Dan Jurafsky (U Colorado), Brian MacWhinney (CMU), Roni Rosenfeld (CMU), Gary Simons (SIL), Richard Sproat (AT&T).

Meetings

  1. Workshop on Web-Based Language Documentation and Description (December 2000)
  2. IRCS Workshop on Linguistic Databases (December 2001)

Other meetings are in the planning stages.

This material is based upon work supported by the National Science Foundation under Grant No. 9910603.


Steven Bird
sb@ldc.upenn.edu