Implementing DMPs for Language Resources

For research communities working with language resources, it is clear that the NSF DMP covers all raw data and annotations, where raw data refers to any observation of linguistic behavior whether recorded as text or audiovisual media, and annotation includes transcription, translation and any tagging or coding of language form and meaning.

Many in the language research community are already accustomed to depositing their resources at a data center. Since its founding in 1992, LDC has distributed over 175,000 copies of data sets to organizations worldwide. LDC distributes data under membership agreements and corpus licenses. This well-tested model is easily adapted to data management plans.

LDC published data includes substantive metadata compliant with the standard developed by OLAC (Open Language Archives Community) and widely accepted among language researchers.

The LDC Catalog is recognized as a trustworthy data repository under the CoreTrustSeal  certification established by the ISCU World Data System and the Data Seal of Approval. This means that the Catalog meets a series of high standards regarding data access, rights management, curation and archival storage. It is also a recognized repository for NSF funded data and thus the logical choice for resources developed under data management plans.