LDC98S70 - Speech data
LDC98T27 - Transcripts
Introduction
This release of HUB5 Spanish training data consists of 106 calls
derived from the CALLFRIEND
Spanish (Language ID) collection. The
transcripts cover a contiguous 10-30 minute segment taken from a
recorded conversation lasting up to 30 minutes. These calls were
originally collected by the LDC in support of the project on Language
Recognition, sponsored by the U.S. Department of Defense. All of
these calls are being designated as additional training data for the
project on Large Vocabulary Conversational Speech Recognition (LVCSR)
in Spanish.
Data
Speakers were solicited by the LDC to participate in this
telephone speech collection effort via the internet, publications
(advertisements) and personal contacts. A total of 200 call
originators were found, each of whom placed a telephone call via a
toll-free robot operator maintained by the LDC. Access to the robot
operator was possible via a unique Personal Identification Number
(PIN) issued by the recruiting staff at the LDC when the caller
enrolled in the project.
Once a caller was recruited to
participate, he/she was given a free choice of whom to call. Recruits were
given no guidelines concerning what they should talk about. Most
participants called family members or close friends. All calls
originated in North America and were placed to various locations
within North America, Puerto Rico or the Dominican Republic. The
participants were made aware that their telephone call would be
recorded, as were the call recipients. The call was allowed only if
both parties agreed to being recorded. Each caller was allowed to
talk up to 30 minutes. Upon successful completion of the call, the
caller was paid $20 (in addition to making a free long-distance
telephone call). Each caller was allowed to place only one telephone
call.
HUB5 Spanish speech and transcript data may be
obtained by contacting the LDC
Updates
There are no updates at this time.
Copyright |