![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
||||
|
|
Linguistic Resources
Use of LDC Corpora by Students Ways LDC corpora have been used for student research and for teaching purposes at university summer school programs.
Use of LDC Data by High School Students -- June 16, 2010
LSA Summer Institute and LDC Corpora -- August 17, 2007The LDC was pleased to provide access to several LDC corpora for students at the Linguistic Society of America (LSA) 2007 Summer Institute. This year's institute, entitled 'Empirical Foundations for Theories of Language', was hosted by Stanford University. The institute drew researchers and students from across the globe and included a number of courses that provided students with hands-on experience in working with linguistic data. The following examples, which demonstrate how large-scale databases can be used for teaching purposes, were submitted by course instructors at the LSA 2007 Summer Institute:. In "Information Structure and Word Order Variation", taught by Betty J. Birner and Gregory Ward, students used LDC corpora to collect tokens of various constructions displaying non-canonical word order, with the goal of discovering how various categories of information are distributed in these non-canonical constructions. Among the corpora used were the Treebank and Brown Corpus (including the Wall Street Journal), and Switchboard. In “Pronunciation Variation and Psycholinguistics”, taught by Susanne Gahl, students examined pronunciation variants and fluctuations in speaking rate in the Switchboard corpus, with the aim of understanding the mechanisms underlying human language production and comprehension. For "Paraphrase and Usage" taught by Annie Zaenen, Cathy O'Connor, and Tom Wasow, students were required to initiate a small corpus study in order to receive credit. The focus of the class was grammatical alternations and the factors that determine their relative frequencies. The purpose of the project requirement was to give students hands-on experience in exploring usage data. Students used a variety of corpora for their projects, including the Treebank and TIPSTER. The LDC looks forward to collaborating with LSA for future institutes.
EMLS Summer School -- July 21, 2006European Masters in Language and Speech (EMLS) is a network of European Universities providing education in natural language processing and speech communication sciences. EMLS organizes regular summer schools which attract considerable interest of students from both NLP and speech processing domains. . For this year’s summer school in Utrecht (NL), members of Speech@FIT group, (Faculty of Information technology, Brno University of Technology, Czech republic), prepared two tutorials making use of LDC corpora: .
during “Speech recognition based on Hidden Markov Models” given by Jan “Honza” Cernocky, students built a recognizer of connected digits using HTK tools. The recognizer is comparable to the Aurora-ETSI standard and on clean data, it has more than 99% word accuracy. TI-DIGITS database from LDC was used in this tutorial. . - in “Phoneme posterior estimation and acoustic keyword-spotting”, given by Igor Szoke, students got acquainted with theory and practice of phoneme recognition and posterior estimation by neural network and with their use in acoustic keyword spotting. The tutorial was based on the one of LDC classics: TIMIT. LDC supported these two tutorials with the data – while the use of TIDIGITS was limited to the EMLS, EMLS students were offered in kind copies of TIMIT including the documentation for home use. EMLS homepage Utrecht summer school page Speech@FIT home
|
|||
|
About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data
Contact ldc@ldc.upenn.edu |
||||