CLLRD Workshop | Linguistic Data Consortium

Citizen Linguistics in Language Resource Development Workshop

The third workshop on novel incentives in linguistic data collection, originally scheduled for May 2020 in conjunction with the cancelled Twelfth International Conference on Language Resources and Evaluation (LREC2020), was held online on December 4, 2020. Published proceedings for CLLRD are available in PDF.

Background

Notwithstanding advances in data collection and processing, language related research, education and technology development continue to suffer from inadequate supply of Language Resources. To supplement traditional LR development, which typically relies upon top down support from some government or private foundation, Citizen Linguistics (the Citizen Science of Language) changes the incentive model to attract a new workforce which in turn requires a different kind of workflow. Incentives to Citizen Linguists may include the opportunities to learn and develop new skills; to socialize, compete and earn status or recognition; to document their language and promote their culture and, most importantly, to contribute directly to research and indirectly to a greater cause or social good. By offering human contributors sustained access to appropriate opportunities, activities, and incentives, we can enhance LR development well beyond what traditional direct funding alone can produce. However, along with these new incentives and workflows come new challenges whose solutions are relevant even to expert (paid) annotation.

The goal of this hybrid workshop/tutorial is two-fold. First is to provide a forum for researchers and practitioners to explore and discuss the issues, advantages and challenges of using Citizen Linguistics as a method for the creation of language resources. Second is to introduce LanguageARC, a new Citizen Linguistics web portal for collecting language data and judgements.

Papers

LanguageARC: Developing Language Resources Through Citizen Linguistics
James Fiumara, Christopher Cieri, Jonathan Wright and Mark Liberman
Paper in PDF

Developing Language Resources with Citizen Linguistics in Austria – A Case Study
Barbara Heinisch
Paper in PDF

Objective Assessment of Subjective Tasks in Crowdsourcing Applications
Giannis Haralabopoulos, Myron Tsikandilakis, Mercedes Torres Torres and Derek McAuley
Paper in PDF

Speaking Outside the Box: Exploring the Benefits of Unconstrained Input in Crowdsourcing and Citizen Science Platforms
Jon Chamberlain, Udo Kruschwitz and Massimo Poesio
Paper in PDF

Leveraging Non-Specialists for Accurate and Time Efficient AMR Annotation
Mary Martin, Cecilia Mauceri, Martha Palmer and Christoffer Heckman
Paper in PDF

The INCOMSLAV Platform: Experimental Website with Integrated Methods for Measuring Linguistic Distances and Asymmetries in Receptive Multilingualism
Irina Stenger, Klara Jagrova and Tania Avgustinova
Paper in PDF

Identifications of Speaker Ethnicity in South-East England: Multicultural London English as a Divisible Perceptual Variety
Amanda Cole
Paper in PDF

LanguageARC - a tutorial
Christopher Cieri and James Fiumara
Paper in PDF

Organizers

Chris Callison-Burch, University of Pennsylvania, USA
Christopher Cieri, Linguistic Data Consortium, University of Pennsylvania, USA
James Fiumara, Linguistic Data Consortium, University of Pennsylvania, USA
Mark Liberman, Linguistic Data Consortium, University of Pennsylvania, USA

Program Committee

Sonja Bosch, University of South Africa, South Africa
Chris Callison-Burch, University of Pennsylvania, USA
Nicoletta Calzolari, Institute for Computational Linguistics, Italy
Khalid Choukri, ELRA/ELDA, France
Christopher Cieri, Linguistic Data Consortium, University of Pennsylvania, USA
John Coleman, Oxford University, UK
Maxine Eskenazi, Carnegie Mellon University, USA
Karën Fort, Sorbonne Université, France
James Fiumara, Linguistic Data Consortium, University of Pennsylvania, USA
Mark Liberman, Linguistic Data Consortium, University of Pennsylvania, USA
Peter Patrick, University of Essex, UK
Massimo Poesio, Queen Mary University of London, UK
Stephanie Strassel, Linguistic Data Consortium, University of Pennsylvania, USA
Jennifer Tracey, Linguistic Data Consortium, University of Pennsylvania, USA