LDC Corpus Submission

Please fill out the following form as completely as possible. If more than one choice applies, list other choices in the "other/notes" box. Mandatory fields are marked with an asterisk (*).

Primary Contact
Corpus Size
(Fill in all that apply. Provide best estimate if exact numbers are not available)
Text Data Details
(for any text, including transcriptions for audio and video corpora, text elements of lexicons, etc.)
(e.g. NITF, NewsML, MDE, TIMEX2), include specification url if applicable
Audio Data Details
(including audio tracks for video corpora)
Video Data Details
Tools, Other
Distribution constraints
Please describe any intended or recommended applications for your corpus; e.g. event detection, machine translation, part of speech tagging, speech recognition, language identification, etc... See https://catalog.ldc.upenn.edu/search for a more complete list of recommended applications.
Please describe the nature of the source data and the methods of collection; e.g. broadcast, conversation, news, documentary, prompted/spontaneous, demographics, intended audience, telephone, background noise, field/studio recordings, etc... See https://catalog.ldc.upenn.edu/search for a more complete list of data sources.
Please provide a description of the corpus. Included here -- and not in the "Data sources" field -- should be descriptions of annotation (including annotation specification url), transcription, post-processing, feature extraction, etc. There may be some overlap between this field and the preceding field. You may use these fields at your discretion in order to provide the clearest possible description of the corpus contents.