|The recordings on this nine-disc set were
originally made in 1978-79 as part of a British Home Office study into
speaker identification techniques. Subsequently, it was realized that
a large body of unconstrained conversational material might be of
interest to researchers working in other speech processing fields.
The recordings were transcribed and the CD-ROMs prepared during 1993.
The recordings were made at the Police Staff College, Bramshill,
Hampshire, England. The participants were police officers taking part
in the various courses at the college. This provided a wide range of
regional accents and a range of ages from late teens to early fifties.
Each speaker is described by nine demographic attributes.
Three adjacent bedrooms were used. The two participants, each
alone in their rooms, conversed by telephone. The third room
was used as a monitoring and recording station.
In addition to the telephone recordings, reference recordings were
made using a high quality dynamic microphone in each room. It is
these higher quality recordings, not the telephone speech, which are
provided on the BRAMSHILL CD-ROM set.
The recordings were made on a Sony Elcaset EL-7 cassette machine,
chosen at the time because of its good speed stability. The
microphone was a Shure SM-7 cardioid type. The speech data was
sampled at 10 kHz, 16-bit resolution.
Some attempt was made to control the acoustic environment.
It is evident from listening to the recordings that, while
these measures produced a reasonable recording environment,
the rooms were far from soundproof. A variety of external
noises (engines, aircraft, etc) can be heard on some of the
Each speaker was given a pile of photographs. In response to a bleep
signal, each speaker introduced himself by name and read a set of test
sentences. After this, the main part of the conversation took place,
in which participants were asked to determine which of each pair of
photographs has been taken first (if indeed they were related at all).
The conversations continued for 10 minutes until terminated by another
During the digitization process, some periods of silence were
removed, so some recordings now appear to be shorter than the
original ten minutes. Furthermore, this means that recordings
of two sides of a conversation are no longer time-aligned. In
addition, to preserve the anonymity of the speakers, some
passages (mainly the introductions) have been erased by
replacing with binary zeroes. Finally the bleep signals have
also been erased with binary zeroes. The transcriptions
indicate where this has occurred.
The speech was transcribed verbatim. No attempt was made to correct
grammar, fill in missing words etc. Transcription conventions are
detailed in the documentation.
Every lexical word from
the transcriptions is contained in the dictionary supplied in the INDEX
directory. There are about 6,500 word types in the 600k words of the
transcripts. Contractions, part-words, slang words, hesitation sounds
and the non-speech sounds such are all treated as words in their own
right in the dictionary.
Portions © 1994 Trustees of the University of Pennsylvania