Introduction
The Switchboard Cellular Part 1 collection focused primarily on
GSM cellular phone technology. The collection commenced
11/12/1999 and was completed on 05/15/2000. The project's goal was
to target 190 subjects balanced by gender and under varied
environmental conditions to participate in (10+) five to six minute
conversations on GSM cellular phones. The speech data was
collected for research, development, and evaluation of automatic
systems for speech-to-text conversion, talker identification,
language identification and speech signal detection purposes.
The Switchboard Cellular Part 1 Transcription was produced by
the Linguistic Data Consortium, catalog number LDC2001T14 and ISBN
number 1-58563-214-7. This release contains the 250 transcriptions
of speech data files that correspond with the Switchboard Cellular
Part 1 Transcribed Audio (LDC2001S15), along with documentation
describing speaker information (sex, age, education, city and
state where raised), call information (date, time, call duration,
Personal Identification Numbers, topic), and audit information
(channel quality, background noise).Switchboard Cellular Part 1
calls were transcribed using conventions similar to HUB5 English.
During the collection period, the LDC collected a total of 1,309
calls, or 2,618 sides (1957 GSM), from 254 participants (129 Male,
125 Female) under varied environmental conditions, of which 250
calls were transcribed.
Each speech file consists of a 1,024-byte ASCII-formatted Sphere
header, followed by two-channel interleaved mu-law sample data. The
mu-law samples represent the actual digital data transmission from
the telephone service provider (MCI), as captured separately for
each side of the telephone conversation by the LDC's telephone
collection platform. The header also indicates the caller_pin,
callee_pin, topic_id, cellular service/handset information and
speaker demographic information.
The documentation also contains reports on clipped files.
For an example transcript please click here.
There are a total of 250 transcribed files, for a rough total
of 12 hours of audio data, 1,431 Mbytes.
Updates
There are no updates at this time.
Content Copyright |