Introduction
The CALLHOME German
corpus collection includes a lexical component. The CALLHOME
German lexicon consists of 318,807 words. Of these, 315,503 words are
adapted from the CELEX German lexicon produced by The Centre for
Lexical Information, Max Planck Institute for Psycholinguistics in
Nijmigen and 3,304 additional words come from the 80 training and 20
development test (devtest) transcripts (ten minutes each) from the LDC
German CALLHOME telephone speech corpus.
Data
The German lexicon contains tab-separated information fields with
orthographic, morphological, phonological, stress, source and
frequency information for each word.
Here is a sample page from the lexicon. The transcripts and documentation (LDC97T15) are
available separately, as is a corpus of telephone speech (LDC97S43).
Updates
There are no updates at this time.
Content Copyright |