NIEUW – Novel Incentives and Workflows in Linguistic Data Collection and Annotation

Despite more than two decades of intensive effort by government programs, corporations, international data centers and research groups, the supply of linguistic resources (LRs) for Human Language Technology development in most languages is a mere fraction of what is required. The research and technology development communities need infrastructure to develop high quality LRs continuously without relying upon rare resources, such as direct funding.

LDC received a CISE Research Infrastructure (CRI) planning grant (CI-P #1629923) from the National Science Foundation (NSF) to build a framework to dramatically increase the store of LRs in a variety of languages by employing crowdsourcing techniques proven to work in multiple scientific disciplines. Social media, games with a purpose, and citizen science have shown us that human resources are effectively limitless for some activities. By creating an infrastructure that supported the ongoing building of scalable and portable language collection and annotation activities made available to the public on the web and via mobile devices, and by offering contributors appropriate incentives, we aimed to enhance LR development well beyond what project-dependent, direct funding alone can produce.

We proposed to build borderless communities of people who work toward common goals by contributing linguistic data and judgments. These data and judgments came in the form of, for example, playing a language-related online game, engaging in a “citizen linguist” annotation-based research project, or uploading stories about ones’ culture or experience of a local event (e.g., the earthquakes in Amatrice, Italy). Inspired by proven successes in other fields, LDC’s plan was to develop a series of web portals, or interfaces, dedicated to a variety of language collection and annotation activities and games that appealed to different potential contributors through targeted alternative incentive and design strategies.

The immediate resulting data from this project will be made freely available to researchers and the general public alike. 

Additional Information

Christopher Cieri, James Fiumara, Mark Liberman, Chris Callison-Burch, Jonathan Wright
Introducing NIEUW: Novel Incentives and Workflows for Eliciting Linguistic Data
LREC 2018: 11th Edition of the Language Resources and Evaluation Conference, Miyazaki, May 7-12
Available: Paper in PDF

Christopher Cieri
Novel Incentives in Language Resource Development
LREC 2016 Workshop: Novel Incentives for Collecting Data and Annotation from People: types, implementation, tasking requirements, workflow and results, Portoroz, May 28
Available: Paper in PDFSlides in PDF 

NIEUW Workshop
Hosted by LDC and supported by NSF CISE CRI planning grant #1629923, this workshop was held October 3-4, 2016 at the University of Pennsylvania, Philadelphia, PA.