Advantages of Data Center Distribution

The sharing requirement of data management plans may be satisfied in a number of ways. Individual investigators and researchers can choose their personal websites as a distribution medium, the investigator’s institution may provide some sharing infrastructure in the form of a web site or archive and researchers can deposit funded data with established repositories such as data centers, to name a few. Distribution through data centers offers numerous advantages over self-distribution because data centers have in place infrastructures and processes for reviewing, storing and distributing resources over the long-term, a key element for data management plans in general.     


Data centers, principal among these, the Linguistic Data Consortium, provide access to most of the resources publicly available for language-related education, research and technology development. Over time they have acquired and institutionalized specialized data management skills that they apply to their holdings. Specifically, LDC performs pre-publication reviews of submitted data and works with contributors to resolve any issues pertaining to content and data integrity. Consortium data releases are typically directed to a broad audience. LDC provides assistance to contributors to accomplish that objective and prepares its own data descriptions as well that are contained in the Catalog and print communications. All of these steps are taken in order to produce the highest quality corpora possible.


LDC maximizes resource discoverability by adding all contributions to its Catalog with optimized search capabilities and mirroring the Catalog via OLAC, the Open Language Archives Community, and the ELRA Universal Catalog. Publications and activities of interest are announced on LDC’s website, in its monthly newsletter – with a circulation of over 20,000 researchers worldwide – and on social media sites including Linked-In, Twitter, Facebook, the LDC Blog and YouTube. LDC researchers frequently attend major conferences to present on contributed and locally-produced resources and host vendor tables each year at several of the most important conferences in the field, the latter to maximize opportunities to engage with the community about Consortium activities and Catalog resources. Senior LDC researchers frequently serve on advisory and oversight boards, conference program committees, planning grant workshops and funding panels to promote greater resource sharing, forge relationships among data creators and users and encourage best practices. The Catalog specifies relations among data sets, and LDC has identified more than 13,000 published papers that rely upon resources published by the Consortium.


Data Centers offer stability; resources continue to be available even after the relevant funding is gone and the authors have undertaken new projects. LDC has been in continuous operation since 1992, longer than other language-related data centers, with every corpus contributed to the Consortium still available. The steady support of Consortium members assures longevity even when soft-monies are scarce.

Expertise and Innovation

As the first and most active language resource data center, LDC established or adopted many of the practices that the related research communities follow today. LDC augments its operational expertise by maintaining its own research agendas focused on the development of language resources and their impact on research and technology development. The presence of research teams within the Consortium’s management staff gives LDC a unique and sympathetic perspective on the needs of the research communities it serves. Of course, as technologies evolve, it is critical for data centers to create new infrastructure in order to maximize quality and effectiveness while minimizing cost and timeline. Recent LDC innovations include infrastructure for data collections from SMS, the WebAnn framework for creating annotation tools, a new interface that implements familiar e-commerce concepts and distribution through the cloud and service grids. Nearly as important as developing the necessary expertise is recognition among user communities. More than half of LDC’s 800 titles are contributed and over 6,000 organizations worldwide have come to LDC to license more than 175,000 copies of language resources.