What's New:
LDC will be closed in observance of the Memorial Day Holiday on Monday, May 30. We will resume normal business hours on Tuesday, May 31.
Stay tuned for upcoming newsletter highlights from the last three decades!
These resources are available in three corpora:
LDC2020T24 LORELEI Ukrainian Representative Language Pack
LDC2020T10 LORELEI Entity Detection and Linking Knowledge Base
LDC Submissions is a platform that provides infrastructure and resources for sharing data through the Catalog. After registering for a user account, corpus submitters can create a submission, upload files, and communicate with LDC’s publications team during the review process. After all reviews are complete, the final, release-ready version of your data set is uploaded to the platform and enters the publications queue.
Sharing your corpus through LDC ensures access to the global research community and the permanent preservation of your data according to best practices for archiving digital language resources. Get started and register for an LDC Submissions user account today.
LDC’s language resources now include a Digital Object Identifier (DOI), an internationally recognized identification standard for online digital material. This means that LDC data sets have four persistent identifiers, a unique LDC number, ISBN, ISLRN and DOI. DOIs are alpha numeric strings that correspond to URLs and metadata for specified resources. They are expressed as links that resolve to the object’s online location. For example, the DOI for Penn Parsed Corpora of Historical English LDC2020T16 is https://doi.org/10.35111/4hzx-5483 which leads users to the LDC catalog entry for this data set. To facilitate its assignment and administration of DOIs, LDC has joined DataCite, a global DOI provider for research data. Adding DOIs is consistent with our aim to follow best practices for archiving and curating digital resources, evidenced by the CoreTrustSeal certification which recognizes the LDC Catalog as a trustworthy data repository.
Web pages about data management plans (DMPs) describe the Consortium’s capabilities to develop and implement project specific proposals. To satisfy requirements from funders like the National Science Foundation (NSF) that researchers deposit data in an accessible, trustworthy repository, LDC provides archiving services and makes data publicly available at a reasonable cost while protecting intellectual property rights and privacy concerns.
Browse the pages to learn more about the advantages of data center distribution, the details of NSF DMP requirements and the infrastructures and processes LDC has in place for storing and distributing resources over the long-term.