 |
IRCS Workshop on Linguistic Databases
11-13 December 2001
University of Pennsylvania, Philadelphia, USA
Organized by Steven Bird, Peter Buneman and Mark Liberman
Funded by the National Science Foundation
|
 |
PROGRAM
Tuesday
8:15 Onwards
Continental breakfast
9:00 OPEN
Welcome
9:10 Examples of Linguistic Databases
Session chair: Steven Bird
Invited talks:
9:10 Brian MacWhinney (Carnegie Mellon University, USA)
A Survey of Annotated Linguistic
Databases in the CHILDES and TalkBank Projects
[ppt]
9:30 Bill Labov (University of Pennsylvania, USA)
Sociolinguistic databases
9:50 Chris Cieri (University of Pennsylvania, USA)
Linguistic databases at the Linguistic
Data Consortium
[ppt]
10:10 BREAK
10:40 Examples of Linguistic Databases (cont)
Session chair: Peter Buneman
10:40 William Kretzschmar (University of Georgia, USA)
Linguistic Databases of the American Linguistic
Atlas Project (ALAP) [abstract]
[project]
11:00 Jan Hajic, Barbora Hladka, Petr Pajas (Charles University,
Czech Republic)
The Prague Dependency Treebank: Annotation
Structure and Support [abstract] [ppt]
[ps]
11:20 Paola Monachesi, Alexis Dimitriadis, Anne-Marie Mineur, Rob
Goedemans, Manuela Pinto (Utrecht University, Netherlands)
The Typological Database System [abstract]
[pdf]
11:40 D. Terence Langendoen, William Lewis, and Scott Farrar
(University of Arizona, USA)
Building a Knowledge Base of Morphosyntactic
Terminology [abstract] [ppt]
[ps]
12:00 LUNCH
1:30 Demonstrations: Linguistic Databases
These will take place in parallel, like a poster session. Participants
will be expected to bring laptops to give the demos. Ethernet and high-speed
internet access will be available.
- B1 Masayuki Asahara, Ryuichi Yoneda and Yuji Matsumoto (Nara Institute
of Science and Technology, Japan)
Use of a Relational Database in the Development and Maintenance of Linguistic
Resources for Statistical Japanese Morphological Analysis [abstract]
[ppt] [handouts]
[ps]
- B2 R.J.J.H. van Son, Louis C.W. Pols (University of Amsterdam, Amsterdam
Center for Language and Communication, Netherlands)
Structure and access of the open source IFA-corpus
[ abstract
| demo
]
- B3 Heather Bliss, Elizabeth Ritter (University of Calgary, Canada)
Developing a Database of Personal and Demonstrative Pronoun Paradigms: Conceptual
and Technical Challenges [abstract] [msword]
- C5 Eleanor Culley (University of Virginia, USA)
Electronic Implementation of an Early American Linguistic Text Collection: The
Online Version of Harry Hoijer's Chiricahua and Mescalero Texts [abstract]
- C6 Mark Davies (Illinois State University, USA)
Using Relational Databases to Create Unlimited and User-Defined Annotation on
Large Corpora: A 100 Million Word Corpus of Historical and Modern Spanish [html]
- D7 Gurlekian Jorge*, Colantoni Laura*, Torres Humberto*, Rodríguez
Hernán*, Rincón Antonio**, Moreno Asunción** and Mariño
José**. (Laboratorio de Investigaciones Sensoriales CONICET, Buenos
Aires, Argentina; Applied Technologies on Language and Speech, Barcelona,
Spain)
Database for an Automatic Speech Recognition System for Argentine Spanish [abstract]
[ppt] [ps]
- D8 Dunstan Brown (University of Surrey)
Constructing a typological database for inflectional morphology: the SMG database
for syncretism [abstract] [ppt]
[ps]
2:30 Models
Session chair: Susan Davidson
Invited talk:
2:30 David Maier (Oregon Graduate Institute, USA)
An Architecture for Superimposed Information
[ppt] [ps]
3:00 Models (cont)
3:00 Brian Roark (AT&T Shannon Labs, USA)
Storing automatically generated
treebanks in lattices of derivations [ps]
3:20
Larry Hayashi and John Hatton (SIL International, USA)
Combining UML, XML and relational database
technologies - the best of all worlds for robust linguistic databases [abstract]
[ppt] [ps]
3:40 BREAK
4:00 Models (cont)
Session chair: Val Tannen
4:00
Nancy Ide, Laurent Romary (Vassar College, USA; LORIA/CNRS,
France)
Standards for Language Resources [abstract]
[ppt]
4:20 Hennie Brugman, Peter Wittenburg (Max Planck Institute,
Netherlands)
The application of annotation models for
the construction of databases and tools: Overview and analysis of MPI work since
1994 [abstract] [ppt]
[ps]
4:40 John Thomson (SIL International, USA)
Representing multilingual and annotated
text in memory and in a relational database [abstract]
[ppt] [ps]
5:00 Open Discussion
5:30 CLOSE
Wednesday
8:15 Onwards
Continental breakfast
9:00 Models of Standoff Annotation
Session chair: David Maier
Invited talks:
9:00 Henry Thompson (Edinburgh University, UK)
Linguistic Annotation and Standoff
Markup
9:30 Amy Isard (Edinburgh University, UK)
Tutorial on Standoff Markup in the MATE
Model of Linguistic Annotation [ppt]
[ps]
10:00 BREAK
10:20 Models of Standoff Annotation (cont)
10:20
Mark Liberman (University of Pennsylvania, USA)
An approach to linguistic annotation [ppt]
10:40
Steven Bird (University of Pennsylvania, USA)
Tutorial on annotation graphs [ppt]
11:00 Open Discussion: What is XML Good For?
Facilitator: Peter Buneman
Does XML provide the necessary framework for solving our problems,
or is it simply a distraction from the real problems?
12:00 LUNCH
1:30 Demonstrations: Systems
- B1 Martin Holub, Pavel Míka (Charles University, Czech Republic)
MATES: An experimental linguistic database system [ps]
- B2 Thomas Schmidt (University of Hamburg, Germany)
The transcription system EXMARaLDA: an application of the annotation graph formalism
as the basis of a database of multilingual spoken discourse [abstract]
[ppt] [ps]
- B3 Jan-Torsten Milde, Ulrike Gut (University of Bielefeld, Germany)
The TASX-environment: an XML-based corpus database for time aligned language
data [abstract]
- B4 Christopher Manning, Kristen Parton (Stanford University, USA)
Kirrkirr: A flexible and approachable software interface to indigenous dictionaries
[abstract] [ppt]
[ps]
- C5 Hennie Brugman, Peter Wittenburg (Max Planck Institute, Netherlands)
MPI tools for linguistic annotation
- D7 Dafydd Gibbon, Thorsten Trippel (University of Bielefeld, Germany)
PAX - an annotation based concordancing toolkit [abstract]
- D8 Steven Bird, Kazuaki Maeda, Xiaoyi Ma (University of Pennsylvania,
USA)
AGTK: the annotation graph toolkit [abstract]
2:30 Query
Session chair: Henry Thompson
Invited talks:
2:30 Dennis Shasha (New York University)
Searching for and Comparing Trees and Graphs
[ppt] [ps]
3:00 Gösta Grahne (Concordia University)
Sequence Queries [ppt]
3:30 BREAK
4:00 Query (cont)
4:00
Uwe Mönnich, Frank Morawietz and Stephan Kepser (University
of Tubingen, Germany)
A Regular Query for Context-Sensitive Relations
[abstract] [pdf]
4:20
Brad Penoff, Chris Brew (Ohio State University, Sun Microsystems,
USA)
TREX-Q: A query language based on XML Schema
[abstract] [pdf]
4:40
Christopher Manning, Kristen Parton (Stanford University,
USA)
What's needed for lexical databases? Experiences
with Kirrkirr [abstract] [ppt]
[ps]
5:00
Peter Buneman (University of Pennsylvania, USA)
XML query languages: what they
can and cannot do for linguistic databases
5:20 Open Discussion
6:00 CLOSE
6:30 BANQUET
Thursday
8:15 Onwards
Continental breakfast
9:00 Lexical Databases (parallel session)
Session chair: Chris Manning
9:00
Jeff Good, Ronald Sprouse (University of California, Berkeley,
USA)
Creating a database and query-tools for
the TELL multi-speaker linguistic corpus [abstract]
9:20
I. Aldezabal, O. Ansa, B. Arrieta, X. Artola, A. Ezeiza, G.
Hernández, M. Lersundi (University of The Basque Country)
EDBL: a general lexical basis for the automatic
processing of Basque [abstract] [ppt]
[ps]
9:40
X. Artola, A. Soroa (University of The Basque Country)
An Architecture for a Federation of Highly
Heterogeneous Lexical Information Sources [abstract]
[ppt] [ps]
10:00
Sonya Bird, Melody Jeffcoat, Michael Hammond (University
of Arizona, USA)
Electronic dictionaries for languages of
the Southwest [abstract] [ppt]
[ps]
10:20
Angelo Dalli (University of Malta, Malta)
Interoperable Extensible Linguistic Databases
[abstract] [ppt]
[ps]
9:00 Archives (parallel session)
Session chair: Brian MacWhinney
9:00
Chu-Ren Huang, Feng-Ju Lo, Hui-Jun Hsiao, Chiu-Jung Lu, Chin-chun
Hsieh (Academia Sinica and Yuan-Tzu University, Taiwan)
From Language Archives to Digital Museums
Synergizing Linguistic Databases [abstract] [ppt]
[ps]
9:20
Martin Wynne (Oxford Text Archive, UK)
Writing a Corpus Cookbook [abstract]
[ppt] [ps]
9:40
Heidi Johnson (University of Texas at Austin, USA)
Archive of the Indigeneous Languages of
Latin America [abstract] [msword]
10:00
Anthony Aristar, Helen Aristar-Dry (Wayne State University,
University of Eastern Michigan, USA)
The E-MELD Project [abstract]
[ppt] [ps]
10:20
Hans Uszkoreit, Brigitte Joerg, Thierry Declerck, Tillmann
Wegst (University of the Saarland, Germany)
The COLLATE Virtual Information Center
[abstract] Cancelled
10:40 BREAK
11:00 Discussion Groups
Standoff, Query, Lexicons, Archives, Typology, ...
12:00 LUNCH
1:30 Demonstrations: Archives, Metadata, Lexicons
- B1 Daan Broeder, Freddy Offenga, Don Willems, Peter Wittenburg (Max
Planck Institute, Netherlands)
The IMDI Metadata Set, its tools and accessible linguistic databases [abstract]
- B2 Robert Neumann, Ulrike Kiefer (Foundation for Yiddish Language
and Culture, Düsseldorf, Germany)
Archive of the Language and Culture Atlas for Ashkenazic Jewry [abstract]
- C5 Andrew Hippisley, Mariam Tariq, David Cheng (University of Surrey,
UK)
Hierarchical data and the derivational relationship between words [abstract]
[ppt] [ps]
- D7 Jeff Good, Ronald Sprouse (University of California, Berkeley,
USA)
The Turkish Electronic Living Lexicon (TELL)
- D8 Steven Bird, Gary Simons (University of Pennsylvania and SIL
International, USA)
OLAC: The Open Language Archives Community [abstract]
[ppt] [ps]
2:30 Panel: Future Directions
- David Maier Reflections on Linguistic Databases [ppt]
[ps]
- Gosta Grahne
- Terry Langendoen
3:30 CLOSE
Steven Bird,
Peter
Buneman, &
Mark Liberman
(LDC,
CIS,
&
Linguistics)
Email: sb@ldc.upenn.edu,
peter@cis.upenn.edu, myl@unagi.cis.upenn.edu