Preserving Data

LDC is committed to the long-term accessibility and preservation of all data in the LDC Catalog. Preservation is implied in every aspect of LDC’s operations executing its mission and is the ultimate object of all processes relating to data submission, review, publication, curation and archiving. LDC’s preservation activities cover the following:


  • In-house storage accommodating hundreds of terabytes with the capability to scale to petabytes rapidly and transparently.
  • Local and remote storage of copies separately from the primary copy (local) and at a different geographic location (remote).

Updates & Migration

  • Updates to corpora are either released as a new data set or updated in-place. In either case, no data is lost and the ability to roll back to previous versions is maintained.
  • Data is migrated to new formats when needed, such as at the risk of obsolescence, for updating the existing resource or producing a new version.


  • A specialized back-up system utilizes daily snapshots, replications on disk, tape robots, cloud storage services and back-up servers.
  • Independent, mirrored systems are physically located in different buildings.

Data Integrity 

  • Storage features include built-in integrity checks and automatic repairs to ensure fixity.
  • Checksums, file sizes, and other file level metadata are maintained to verify integrity.


  • Real-time monitoring of data and alarming ensure immediate action can be taken when needed.
  • Access policies prevent accidental or unauthorized changes to data from within LDC.