Implementing DMPs for Language Resources

For research communities working with language resources, it is clear that the NSF DMP covers all raw data and annotations, where raw data refers to any observation of linguistic behavior whether recorded as text or audiovisual media, and annotation includes transcription, translation and any tagging or coding of language form and meaning.

Many in the language research community are already accustomed to depositing their resources at a data center. Since its founding in 1992, LDC has distributed over 120,000 copies of data sets to organizations worldwide. LDC distributes data under membership agreements and corpus licenses. This well-tested model is easily adapted to data management plans.

LDC published data includes substantive metadata compliant with the standard developed by OLAC (Open Language Archives Community) and widely accepted among language researchers.

LDC’s Catalog is already a recognized repository for NSF funded data and is the logical choice for resources developed under data management plans.