Introduction
The Switchboard Cellular Part 1 collection focused primarily on
GSM cellular phone technology. The collection commenced 11/12/1999
and was completed on 05/15/2000. The project's goal was to target 190
subjects balanced by gender and under varied environmental conditions
to participate in (10+) five to six minute conversations on GSM cellular
phones. The speech data was collected for research, development, and
evaluation of automatic systems for speech-to-text conversion, talker
identification, language identification and speech signal detection purposes.
The Switchboard Cellular Part 1 Transcribed Audio was produced
by the Linguistic Data Consortium, catalog number LDC2001S15 and
ISBN number 1-58563-215-5. This release contains the 250 speech
data files that correspond with the Switchboard Cellular Part 1
Transcription (LDC2001T14), along with documentation describing
speaker information (sex, age, education, city and state where
raised), call information (date, time, call duration, Personal
Identification Numbers, topic), and audit information (channel
quality, background noise). The data files are not compressed.
Data
During the collection period, the LDC collected a total of 1,309
calls, or 2,618 sides (1957 GSM), from 254 participants (129 Male,
125 Female), under varied environmental conditions, of which 250
calls were transcribed.
Each speech file consists of a 1,024-byte ASCII-formatted Sphere
header, followed by two-channel interleaved mu-law sample data. The
mu-law samples represent the actual digital data transmission from
the telephone service provider (MCI), as captured separately for
each side of the telephone conversation by the LDC's telephone
collection platform. The header also indicates the caller_pin,
callee_pin, topic_id, cellular service/handset information and
speaker demographic information.
The documentation also contains reports on clipped files.
There are a total of 250 transcribed files for a rough total
of 24 hours of audio data, 1,365 Mbytes.
Updates
There are no updates at this time.
Copyright |