A Closer Look at LDC

The LDC Community

LDC is a non-profit hosted by the University of Pennsylvania with the mission to create and broadly share language resources, sustained by its community under the Consortium model. After more than 30 years, LDC continues to provide to the community large quantities of diverse data, research support and high-quality membership services.

Research

LDC supports sponsored research programs and language-based technology evaluations by providing resources and contributing organizational expertise. Today’s complex research programs present challenges that require specialized skills and capabilities. LDC brings its vast experience to bear by combining manual and automatic processes for data collection, processing and analysis; speech transcription, alignment, labeling and coding; translation and parallel text alignment; image and video labeling; syntactic, morphological and semantic mark-up; lexicography; and document analysis.

LDC maintains a diverse research program to extend big data approaches to related fields, including sociolinguistics, clinical research, and educational applications. In the clinical and education space, LDC works with partners to demonstrate the value of automated data-driven analysis of speech and language material. This has included efforts to collect and analyze speech biomarkers in clinical and non-clinical populations, including persons diagnosed with autism, dementia and schizophrenia.

Data Management

The effectiveness of data-driven research depends in no small measure on practical matters around accessing data, using it, archiving it and preserving it. From providing comprehensive user documentation for its resources, to a time-tested license model, to procedures for submitting and sharing data through the catalog, LDC's expertise in data management frees researchers to pursue the next challenge with the assurance that they have the necessary tools for the task. Need a data management plan for your next research proposal? LDC can help with that too. 

Community Engagement

LDC engages with the community at many levels. It participates in funding panels, editorial boards, scientific committees, conferences and workshops and partners, consults and otherwise collaborates with a variety of organizations for a multitude of purposes. Papers, journal articles and books authored by LDC faculty and staff reach key organizations and thought leaders. The LDC Institute, provides a forum for a range of speakers across disciplines to discuss issues broadly related to linguistics, computer science, natural language processing and human language technology development. LDC is a frequent participant, sponsor and exhibitor at key conferences in the field, among them, ACL, COLING, ICASSP, LREC, and Interspeech.

Data Contributors

Data contributors are critical to LDC’s mission to develop, acquire and distribute language resources. Whether they create data which is of interest as corpus source data or contribute data sets for publication in the LDC Catalog, data contributors benefit multiple research communities.

The process for sharing data through LDC is straightforward. Consult Publication Process and  Providing Data for basic information. LDC Submissions furnishes infrastructure and resources for describing and uploading data sets, communicating with LDC’s publications team, and submitting corpora for inclusion in the Catalog.