IRCS Workshop on Linguistic Databases

11-13 December 2001
University of Pennsylvania, Philadelphia, USA

Organized by Steven Bird, Peter Buneman and Mark Liberman
Funded by the National Science Foundation


PROGRAM

Tuesday

8:15 Onwards

Continental breakfast

9:00 OPEN

Welcome

9:10 Examples of Linguistic Databases

Session chair: Steven Bird

Invited talks:

9:10 Brian MacWhinney (Carnegie Mellon University, USA)
       A Survey of Annotated Linguistic Databases in the CHILDES and TalkBank Projects [ppt]

9:30 Bill Labov (University of Pennsylvania, USA)
       Sociolinguistic databases

9:50 Chris Cieri (University of Pennsylvania, USA)
       Linguistic databases at the Linguistic Data Consortium [ppt]

10:10 BREAK

10:40 Examples of Linguistic Databases (cont)

Session chair: Peter Buneman

10:40 William Kretzschmar (University of Georgia, USA)
       Linguistic Databases of the American Linguistic Atlas Project (ALAP) [abstract] [project]

11:00 Jan Hajic, Barbora Hladka, Petr Pajas (Charles University, Czech Republic)
      The Prague Dependency Treebank: Annotation Structure and Support [abstract] [ppt] [ps]

11:20 Paola Monachesi, Alexis Dimitriadis, Anne-Marie Mineur, Rob Goedemans, Manuela Pinto (Utrecht University, Netherlands)
       The Typological Database System [abstract] [pdf]

11:40 D. Terence Langendoen, William Lewis, and Scott Farrar (University of Arizona, USA)
       Building a Knowledge Base of Morphosyntactic Terminology [abstract] [ppt] [ps]

12:00 LUNCH

1:30 Demonstrations: Linguistic Databases

These will take place in parallel, like a poster session. Participants will be expected to bring laptops to give the demos. Ethernet and high-speed internet access will be available.

2:30 Models

Session chair: Susan Davidson

Invited talk:

2:30 David Maier (Oregon Graduate Institute, USA)
       An Architecture for Superimposed Information [ppt] [ps]

3:00 Models (cont)

3:00 Brian Roark (AT&T Shannon Labs, USA)
       Storing automatically generated treebanks in lattices of derivations [ps]

3:20 Larry Hayashi and John Hatton (SIL International, USA)
       Combining UML, XML and relational database technologies - the best of all worlds for robust linguistic databases [abstract] [ppt] [ps]

3:40 BREAK

4:00 Models (cont)

Session chair: Val Tannen

4:00 Nancy Ide, Laurent Romary (Vassar College, USA; LORIA/CNRS, France)
       Standards for Language Resources [abstract] [ppt]

4:20 Hennie Brugman, Peter Wittenburg (Max Planck Institute, Netherlands)
       The application of annotation models for the construction of databases and tools: Overview and analysis of MPI work since 1994 [abstract] [ppt] [ps]

4:40 John Thomson (SIL International, USA)
       Representing multilingual and annotated text in memory and in a relational database [abstract] [ppt] [ps]

5:00 Open Discussion

5:30 CLOSE


Wednesday

8:15 Onwards

Continental breakfast

9:00 Models of Standoff Annotation

Session chair: David Maier

Invited talks:

9:00 Henry Thompson (Edinburgh University, UK)
       Linguistic Annotation and Standoff Markup

9:30 Amy Isard (Edinburgh University, UK)
       Tutorial on Standoff Markup in the MATE Model of Linguistic Annotation [ppt] [ps]

10:00 BREAK

10:20 Models of Standoff Annotation (cont)

10:20 Mark Liberman (University of Pennsylvania, USA)
       An approach to linguistic annotation [ppt]

10:40 Steven Bird (University of Pennsylvania, USA)
       Tutorial on annotation graphs [ppt]

11:00 Open Discussion: What is XML Good For?

Facilitator: Peter Buneman

Does XML provide the necessary framework for solving our problems, or is it simply a distraction from the real problems?

12:00 LUNCH

1:30 Demonstrations: Systems

2:30 Query

Session chair: Henry Thompson

Invited talks:

2:30 Dennis Shasha (New York University)
       Searching for and Comparing Trees and Graphs [ppt] [ps]

3:00 Gösta Grahne (Concordia University)
       Sequence Queries [ppt]

3:30 BREAK

4:00 Query (cont)

4:00 Uwe Mönnich, Frank Morawietz and Stephan Kepser (University of Tubingen, Germany)
       A Regular Query for Context-Sensitive Relations [abstract] [pdf]

4:20 Brad Penoff, Chris Brew (Ohio State University, Sun Microsystems, USA)
       TREX-Q: A query language based on XML Schema [abstract] [pdf]

4:40 Christopher Manning, Kristen Parton (Stanford University, USA)
       What's needed for lexical databases? Experiences with Kirrkirr [abstract] [ppt] [ps]

5:00 Peter Buneman (University of Pennsylvania, USA)
       XML query languages: what they can and cannot do for linguistic databases

5:20 Open Discussion

6:00 CLOSE

6:30 BANQUET


Thursday

8:15 Onwards

Continental breakfast

9:00 Lexical Databases (parallel session)

Session chair: Chris Manning

9:00 Jeff Good, Ronald Sprouse (University of California, Berkeley, USA)
       Creating a database and query-tools for the TELL multi-speaker linguistic corpus [abstract]

9:20 I. Aldezabal, O. Ansa, B. Arrieta, X. Artola, A. Ezeiza, G. Hernández, M. Lersundi (University of The Basque Country)
       EDBL: a general lexical basis for the automatic processing of Basque [abstract] [ppt] [ps]

9:40 X. Artola, A. Soroa (University of The Basque Country)
       An Architecture for a Federation of Highly Heterogeneous Lexical Information Sources [abstract] [ppt] [ps]

10:00 Sonya Bird, Melody Jeffcoat, Michael Hammond (University of Arizona, USA)
       Electronic dictionaries for languages of the Southwest [abstract] [ppt] [ps]

10:20 Angelo Dalli (University of Malta, Malta)
       Interoperable Extensible Linguistic Databases [abstract] [ppt] [ps]

9:00 Archives (parallel session)

Session chair: Brian MacWhinney

9:00 Chu-Ren Huang, Feng-Ju Lo, Hui-Jun Hsiao, Chiu-Jung Lu, Chin-chun Hsieh (Academia Sinica and Yuan-Tzu University, Taiwan)
       From Language Archives to Digital Museums Synergizing Linguistic Databases [abstract] [ppt] [ps]

9:20 Martin Wynne (Oxford Text Archive, UK)
       Writing a Corpus Cookbook [abstract] [ppt] [ps]

9:40 Heidi Johnson (University of Texas at Austin, USA)
       Archive of the Indigeneous Languages of Latin America [abstract] [msword]

10:00 Anthony Aristar, Helen Aristar-Dry (Wayne State University, University of Eastern Michigan, USA)
       The E-MELD Project [abstract] [ppt] [ps]

10:20 Hans Uszkoreit, Brigitte Joerg, Thierry Declerck, Tillmann Wegst (University of the Saarland, Germany)
       The COLLATE Virtual Information Center [abstract] Cancelled

10:40 BREAK

11:00 Discussion Groups

Standoff, Query, Lexicons, Archives, Typology, ...

12:00 LUNCH

1:30 Demonstrations: Archives, Metadata, Lexicons

2:30 Panel: Future Directions

  1. David Maier Reflections on Linguistic Databases [ppt] [ps]
  2. Gosta Grahne
  3. Terry Langendoen

3:30 CLOSE


Steven Bird, Peter Buneman, & Mark Liberman (LDC, CIS, & Linguistics)
Email: sb@ldc.upenn.edu, peter@cis.upenn.edu, myl@unagi.cis.upenn.edu