Please fill out the following form as completely as possible. If more than one choice applies, list other choices in the "other/notes" box. Mandatory fields are marked with an asterisk (*).
Primary contact:
Name*:
E-mail*:
Phone:
Title of corpus
(proposed)*:
Version:
Authors
(comma-separated):
Languages
(comma-separated):
Data type*
(select all that apply):
<--Text
<--Audio
<--Video
<--Lexicon
<--Tool
Other(s)--please specify:
Estimated delivery date*
(When would you be able to provide complete data and documentation to LDC?):
Corpus size
(Fill in all that apply. Provide best estimate if exact numbers are not available):
Data size (uncompressed)*:
Bytes
Kilobytes(KB)
Megabytes(MB)
Gigabytes(GB)
Terabytes(TB)
Hours of Audio or Video:
Number of Words:
Number of Tokens:
Number of Decisions (e.g. for entity annotations):
Text data details
(for any text, including transcriptions for audio and video corpora, text elements of lexicons, etc.):
Character encoding:
UTF-8
UTF-16
ASCII
unknown
other
other/notes:
Format/Structure:
XML
SGML
plain text
unknown
other
other/notes:
Markup schema (e.g. NITF, NewsML, MDE, TIMEX2), include specification url if applicable:
Audio data details
(including audio tracks for video corpora):
Sample rate:
8000 Hz
11025 Hz
22050 Hz
44100 Hz
44056 Hz
unknown/N.A.
other
other/notes:
Audio file extension:
.wav
.aiff
.sph (Sphere)
.mp3
unknown/N.A.
other
other/notes:
Bit depth:
8 bit
16 bit
20 bit
24 bit
unknown/N.A.
other/varied
other/notes:
Sample format:
Linear PCM
u-law
a-law
unknown/N.A.
other/varied
other/notes:
Channel count:
1 (mono)
2 (stereo)
unknown/N.A.
other/varied
other/notes:
Video data details:
Container:
avi
dv
mpeg-ps(vob)
mpeg-ts
mov
mp4
unknown/N.A.
other
other/notes:
Codec:
MPEG-1
MPEG-2
MPEG-4
DivX
Mjpeg
Dv
Xvid
unknown/N.A.
other
other/notes:
Broadcast standard:
NTSC
PAL
SECAM
unknown/N.A.
other
other/notes:
Frame rate:
29.97 fps
25 fps
unknown/N.A.
other
other/notes:
Frame size:
352x240
352x288
480x480
704x480
720x480
unknown/N.A.
other
other/notes:
Lexicons:
Format:
Traditional
Pronunciation
Translation
unknown/N.A.
other
other/notes:
Tools, Other:
Type:
Named Entity List
N-grams/Language Models
Morphological or other Analyzer
unknown/N.A.
other
other/notes:
Ownership:
Do you own all of the data in this corpus?:
Yes
No
Not sure
If "No" or "Not sure", please further explain your answer:
Distribution constraints:
Do you expect that there will be any external contraints on the distribution of this corpus (e.g. release date, price, licensing)?:
Yes
No
Not sure
If "Yes" or "Not sure", please explain your answer:
Data sources:
Please describe the nature of the source data and the methods of collection; e.g. broadcast, conversation, news, documentary, prompted/spontaneous, demographics, intended audience, telephone, background noise, field/studio recordings, etc.
Description*:
Please provide a description of the corpus. Included here -- and not in the "Data sources" field -- should be descriptions of annotation (including annotation specification url), transcription, post-processing, feature extraction, etc. There may be some overlap between this field and the preceding field. You may use these fields at your discretion in order to provide the clearest possible description of the corpus contents.