Use of LDC Corpora in University Summer Schools
The LDC was pleased to provide access to several LDC corpora for students at the Linguistic Society of America (LSA) 2007 Summer Institute. This year's institute, entitled 'Empirical Foundations for Theories of Language', was hosted by Stanford University. The institute drew researchers and students from across the globe and included a number of courses that provided students with hands-on experience in working with linguistic data. The following examples, which demonstrate how large-scale databases can be used for teaching purposes, were submitted by course instructors at the LSA 2007 Summer Institute:.
In "Information Structure and Word Order Variation", taught by Betty J. Birner and Gregory Ward, students used LDC corpora to collect tokens of various constructions displaying non-canonical word order, with the goal of discovering how various categories of information are distributed in these non-canonical constructions. Among the corpora used were the Treebank and Brown Corpus (including the Wall Street Journal), and Switchboard.
In “Pronunciation Variation and Psycholinguistics”, taught by Susanne Gahl, students examined pronunciation variants and fluctuations in speaking rate in the Switchboard corpus, with the aim of understanding the mechanisms underlying human language production and comprehension.
For "Paraphrase and Usage" taught by Annie Zaenen, Cathy O'Connor, and Tom Wasow, students were required to initiate a small corpus study in order to receive credit. The focus of the class was grammatical alternations and the factors that determine their relative frequencies. The purpose of the project requirement was to give students hands-on experience in exploring usage data. Students used a variety of corpora for their projects, including the Treebank and TIPSTER.
The LDC looks forward to collaborating with LSA for future institutes.
[ top ]