LSA 2016 Workshop

Preparing your Corpus for Archival Storage, LSA 2016 Workshop

We are grateful for the funding supplied by the National Science Foundation (BCS #1549994) which will make this special session of LSA 2016 possible. The session will take place on Thursday, January 7, 2016 before the start of the Annual LSA Meeting in Washington, DC.


Malcah Yaeger-Dror, University of Arizona
Christopher Cieri, LDC

A note from the organizers

As you all know, NSF has emphasized that we should prepare our corpora for storage, so that other researchers [or we ourselves, at a later date] can use the older material for comparison with newer studies. This meeting will present critical factors that could not be included in an earlier NSF-supported workshop which examined other factors which must be considered when preparing data for comparison and sharing.

The detailed program and list of scheduled presenters is found here


An NSF supported satellite workshop on Archival Preparation will be held in conjunction with, and immediately preceding, the annual meeting of the Linguistic Society of America on January 7, 2016. NSF policy now stipulates that investigators are expected to make available to other researchers the primary data created or gathered under NSF grants. However, the metadata presently associated with archived data are often inadequate to permit data (e.g., sound files) from related studies to be compared; without an agreed-upon coding protocol, there can be no effective sharing and comparison across speech corpora. Invited speakers will discuss specific coding conventions for such factors as socioeconomic and educational speaker demographics, language choice, stance and footing. Choosing appropriate metadata for these factors will facilitate sharing of corpora and research to determine how each factor impacts on language use. 

NSF previously supported a workshop (at LSA 2012) in which leading scholars discussed data protocols, obtaining ethics board approval for human subject research, and ensuring that the information gathered about human subject demographics, attitudes and the situations in which they were recorded provide enough scope and detail to permit meaningful comparison across studies and thus encourage data sharing. Following that model, this workshop will extend the topics covered and provide a training forum in which to develop protocols for sharable data that conform to the spirit of NSF policy. This award will also support the participation of students in the training and discussions.