LINGUISTIC EXPLORATION


New Methods for
Creating, Exploring and Disseminating
Linguistic Field Data


Thursday 6 January 2000, 9am-6pm
Palmer House Hilton, Chicago
Parlor F

Held in conjunction with the
Annual Meeting of the Linguistic Society of America
7pm, 6 January - noon, 9 January 2000





The new NSF TalkBank Project is sponsoring a workshop on computational support for linguistic fieldwork, to be coordinated by Steven Bird (University of Pennsylvania).

The workshop will bring together linguists and computational linguists committed to empirical research on large datasets, through the combination of traditional field methods and new technologies for exploring and visualizing complex datasets. The languages under study may range from the undescribed to the well-studied, and the fieldworker may operate in a village or a laboratory. The focus is the exploratory mode of research, where elicitation, analysis and hypothesis-testing form a tight loop. The workshop will contribute to the evaluation and evolution of methodologies that integrate traditional practices with new technologies, leading to increased accessibility, accountability, and stability of empirical linguistic research.

The workshop will address a selection of the following issues:

  1. Representation - what are good data models for interlinked, heterogeneous, multimodal linguistic field data, including lexicons, (interlinear) texts, field notes, (annotated) recordings, paradigms, grammar sketches, maps, photographs, folios, course notes and problem sets?
  2. Tools - what are the existing and new tools for manipulating linguistic field data, and what are their strengths and weaknesses vis-a-vis creating, browsing, searching, querying and transforming this data? How well do the tools accomodate the fieldworker's continuously evolving conception of the data? What statistical corpus-analysis methods are suitable for datasets whose items number in the hundreds rather than the hundreds of thousands?
  3. Collaborative knowledge discovery - how can a geographically distributed network of linguists and native speakers cooperate on the construction, validation and enrichment of multimodal field data? How do we bridge the gap between the field and the laboratory?
  4. Online repositories - how can a collection of online multimodal field data covering many languages be archived and curated? What are the corpora that people are currently willing to share? What are the confidentiality issues, and what mechanisms exist to protect privacy?
  5. Dissemination and citation - how are datasets to be accessed by researchers, native speakers, language learners, field-methods students, and so on? How can we facilitate durable citations to shared linguistic resources, and track the provenance of a data item from a published transcription, through any intermediate databases, right back to a digitized speech recording?

A sample of ongoing work which is closely relevant to the topic of this workshop is available via the Linguistic Exploration page.

To register for the Linguistic Exploration Workshop, and to receive future updates about the workshop and related activities, please contact Steven Bird: sb@ldc.upenn.edu.


PROGRAM

Opening Remarks
9:00 Brian MacWhinney
         Carnegie Mellon University
The NSF TalkBank Project [ abstract ]
9:15 Steven Bird
         University of Pennsylvania
Goals of the Workshop
The Computer in Primary Linguistic Description
9:30 Bill Poser
         Carrier-Sekani Tribal Council, Lheidli T'enneh, and University of British Columbia
Databases for Carrier: Current Status, Desiderata, and Issues [ abstract | paper ]
9:50 Jonathan Amith
         Yale University
What's in a Word? The Why's and What For's of a Nahuatl Dictionary [ abstract | audio: rm, mp3 ]
10:10 Chris Cieri
         University of Pennsylvania
Issues and tools for creating and annotating a corpus of sociolinguistic field data [ abstract | presentation: html, ppt | audio: rm, mp3 ]
10:30 Larry Hayashi
         Summer Institute of Linguistics
Discovering and testing linguistic generalizations using interactive concordances [ abstract | presentation: ppt, | audio: rm, mp3 ]
10:50 Break
Disseminating Linguistic Data on the Web
11:00 Ronald Sprouse
         University of California at Berkeley
Two approaches to linguistic field work on the web: The TELL and Ingush projects [ abstract | audio: rm, mp3 ]
11:20 Steven Bird
         University of Pennsylvania
Exploring and disseminating field data using HyperLex [ abstract | presentation: pdf, ps.gz | audio: rm, mp3 ]
11:40 Michel Jacobson
         CNRS/LACITO
XML tools for managing linguistic data: The LACITO Archives Project [ abstract | presentation: html | audio: rm, mp3 ]
12:00 Lev Michael
         University of Texas at Austin
Plans for a worldwide web archive of the indigenous languages of Latin America [ abstract | audio: rm, mp3 ]
12:15 Lunch
Tools and Models for Linguistic Data
2:00 David Nathan
         Australian Institute of Aboriginal and Torres Strait Islander Studies
Data design for endangered languages: increasing the ``Linguistic Bandwidth'' [ abstract | audio: rm, mp3 ]
2:25 Wallace Hooper
         Indiana University
An integrated multimedia dictionary and text processor for the documentation of endangered languages [ abstract | audio: rm, mp3 ]
2:50 Chris Manning
         Stanford University
Kirrkirr: Experiences with a flexible software interface to indigenous dictionaries [ abstract | presentation: ppt, pdf, ps.gz | audio: rm, mp3 ]
3:15 Ron Zacharski
         New Mexico State University
Boas: A Field Linguist in a Box [ abstract presentation: ppt | audio: rm, mp3 ]
3:40 Break
4:00 Dafydd Gibbon
         University of Bielefeld
The Bielefeld-Abidjan documentation project: Information types and dissemination media [ abstract | presentation: pdf, ps.gz | audio: rm, mp3 ]
4:25 Robert Neumann
         Association for the Promotion of Yiddish Language and Culture
A New Approach to Exploring the Archive of the Language and Culture Atlas of Ashkenazic Jewry [ abstract | presentation: html | audio: rm, mp3 ]
4:45 David Weber
         Summer Institute of Linguistics
Reference grammars for the computational age: From Gleason files to sci-fi grammar [ abstract | audio: rm, mp3 ]
5:10 Rich Thomason
         University of Michigan
Towards computerized support for empirical linguistics: some ideas from computer science [ abstract | audio: rm, mp3 ]
5:40 Steven Bird
         University of Pennsylvania
Multidimensional exploration of linguistic databases [ abstract | presentation: pdf, ps.gz | audio: rm, mp3 ]
6:00 close

Steven Bird
sb@ldc.upenn.edu