| The OGI Spelled and Spoken Telephone Corpus consists of speech
recordings from over 3,650 telephone calls, each made by a different
speaker to an automated prompting/recording system installed at the
Oregon Graduate Institute. Speakers were asked to say their name,
where they were calling from and where they grew up; they were asked
to answer a couple of yes/no questions and to spell their first and
last names; many were also asked to repeat a few specific words and
to recite the letters of the alphabet.
Each response to a prompt is stored as a separate waveform file and
the files are organized according to prompt (response type); all
responses from a given call have a unique caller-index number as part
of the file named, so that responses can easily be sorted by speaker.
Waveform data are stored in compressed form, using the NIST SPHERE 2.0
software package, which is available separately at no charge to users.
SPHERE 2.0 provides the decompression software needed to extract the
waveform data, as well as tools for accessing and modifying file
headers.
Time-aligned phonetic transcriptions are provided for a subset of
responses and a complete log of each (giving speaker sex, quality
judgments and orthographic transcriptions of all responses) is
included in a form suitable for use as a relational data base.
Content Copyright |