Introduction
The CALLHOME
English collection includes a lexical component. The CALLHOME
American English Lexicon was originally distributed under the name
COMLEX Pronouncing Lexicon, or PRONLEX.
Organizations that have already received PRONLEX will not be required
to purchase the CALLHOME American English Lexicon.
Data
The latest version of PRONLEX contains 90,988 lexical entries and
includes coverage of WSJ30, WSJ64, Switchboard and CALLHOME
English. (WSJ30K and WSJ64K are word lists selected from several
years of Wall Street Journal texts used in recent ARPA Continuous
Speech Recognition corpora. Switchboard is a three million word
corpus of telephone conversations on a variety of topics.)
This lexicon is available by ftp to organizations who sign a license
agreement, which is also found on the LDC FTP site.
The PRONLEX documentation describes the principles observed for word
transcription. Although predictable variation in pronunciation due to
dialect or variable reduction has not been notated in the lexicon
itself, the documentation notes systematic dialectal variants, which
may be generated by rule. In addition, alternate pronunciations are
given for words whose pronunciation varies by part of speech (e.g.,
abstrAct, Abstract), or in less systematic but salient ways
(especially names). Classes of exceptions to the transcription
principles, such as names, function, words and foreign words, are
tagged.
Here is a sample page.
The transcripts and documentation (LDC97T14) are
available, as well as a corpus of telephone speech (LDC97L20).
Updates
There are no updates at this time.
Content Copyright |