Publication Process

LDC releases approximately 36 corpora annually; data submissions are continuously reviewed and prepared to meet that goal. The yearly publications schedule is developed well in advance, taking into account the diversity of publications in a given month and over the entire year, with some flexibility to meet any specific data provider needs. Data providers should expect the following process when contributing data to LDC.

The start of any submission for a data provider is to review the technical and documentation guidelines. The next step is to create an account on LDC Submissions, a portal that provides infrastructure and resources for describing and uploading data sets and communicating with LDC’s publications team. The team contacts data providers with any questions about the submission and with details about distribution and licensing.

For data collected from human subjects, providers must describe the collection; indicate whether the collection was approved by an Institutional Review Board, Ethics Board, or similar body; describe the procedures for obtaining consent from study participants; and list the elements of consent, including that participants consented to sharing their data in a corpus.  

All providers must sign a distribution agreement with LDC that gives the Consortium the right to store and distribute the submitted resource.  

Selected corpora are scheduled for publication once a distribution agreement has been signed by both parties and the release-ready version of the corpus and its documentation has been received and reviewed by LDC. LDC performs extensive quality control checks to ensure that published data is complete, error free and ready to use. See technical and documentation guidelines for more information on LDC’s requirements and guidelines. Every effort is made to accommodate publication scheduling requests, but this may not be possible in some cases.

LDC contacts the data provider approximately one month before publication with respect to required changes or edits to the resource and other related matters. Depending on the scope of any such changes, the publication of the resource may be delayed.

Corpora are announced and released around the 15th of each month. In most cases, data is distributed via download from LDC’s member portal. For very large corpora, delivery is made on hard drive or USB flash drive.