NIEUW – Novel Incentives and Workflows in Linguistic Data Collection and Annotation
Despite more than two decades of intensive effort by government programs, corporations, international data centers and research groups, the supply of linguistic resources (LRs) for Human Language Technology development in most languages is a mere fraction of what is required. The research and technology development communities need infrastructure to develop high quality LRs continuously without relying upon rare resources, such as direct funding.
LDC has received a CISE Research Infrastructure (CRI) planning grant (CI-P #1629923) from the National Science Foundation (NSF) to build a framework to dramatically increase the store of LRs in a variety of languages by employing crowdsourcing techniques proven to work in multiple scientific disciplines. Social media, games with a purpose, and citizen science have shown us that human resources are effectively limitless for some activities. By creating an infrastructure that will support the ongoing building of scalable and portable language collection and annotation activities made available to the public on the web and via mobile devices, and by offering contributors appropriate incentives, we will enhance LR development well beyond what project-dependent, direct funding alone can produce.
We propose to build borderless communities of people who work toward common goals by contributing linguistic data and judgments. These data and judgments may come in the form of, for example, playing a language-related online game, engaging in a “citizen linguist” annotation-based research project, or uploading stories about ones’ culture or experience of a local event (e.g., the earthquakes in Amatrice, Italy). Inspired by proven successes in other fields, LDC’s plan is to develop a series of web portals, or interfaces, dedicated to a variety of language collection and annotation activities and games that appeal to different potential contributors through targeted alternative incentive and design strategies.
The immediate resulting data from this project will be made freely available to researchers and the general public alike.
Hosted by LDC and supported by NSF CISE CRI planning grant #1629923, this workshop was held October 3-4, 2016 at the University of Pennsylvania, Philadelphia, PA.
Novel Incentives in Language Resource Development
LREC 2016 Workshop: Novel Incentives for Collecting Data and Annotation from People: types, implementation, tasking requirements, workflow and results, Portoroz, May 28
Available: Paper in PDF, Slides in PDF