Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



BRAMSHILL

Item Name: BRAMSHILL
Authors: .Linguistic Data Consortium
LDC Catalog No.: LDC94S20
ISBN: 1-58563-029-2
Release Date: Jan 01, 1994
Data Type: speech
Sample Rate: 10000 Hz
Sampling Format: 1-channel pcm
Data Source(s): microphone speech, telephone speech
Application(s): speaker identification
Language(s): English
Language ID(s): eng
Distribution: 2 DVD
Member fee: $0 for 1994, 1996 members
Non-member Fee: US $750.00
Reduced-License Fee: US $400.00
Extra-Copy Fee: US $400.00
Non-member License: yes
Readme File: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: .Linguistic Data Consortium
1994
BRAMSHILL
Linguistic Data Consortium, Philadelphia

The recordings on this nine-disc set were originally made in 1978-79 as part of a British Home Office study into speaker identification techniques. Subsequently, it was realized that a large body of unconstrained conversational material might be of interest to researchers working in other speech processing fields. The recordings were transcribed and the CD-ROMs prepared during 1993.

The recordings were made at the Police Staff College, Bramshill, Hampshire, England. The participants were police officers taking part in the various courses at the college. This provided a wide range of regional accents and a range of ages from late teens to early fifties. Each speaker is described by nine demographic attributes.

Three adjacent bedrooms were used. The two participants, each alone in their rooms, conversed by telephone. The third room was used as a monitoring and recording station.

In addition to the telephone recordings, reference recordings were made using a high quality dynamic microphone in each room. It is these higher quality recordings, not the telephone speech, which are provided on the BRAMSHILL CD-ROM set.

The recordings were made on a Sony Elcaset EL-7 cassette machine, chosen at the time because of its good speed stability. The microphone was a Shure SM-7 cardioid type. The speech data was sampled at 10 kHz, 16-bit resolution.

Some attempt was made to control the acoustic environment. It is evident from listening to the recordings that, while these measures produced a reasonable recording environment, the rooms were far from soundproof. A variety of external noises (engines, aircraft, etc) can be heard on some of the recordings.

Each speaker was given a pile of photographs. In response to a bleep signal, each speaker introduced himself by name and read a set of test sentences. After this, the main part of the conversation took place, in which participants were asked to determine which of each pair of photographs has been taken first (if indeed they were related at all). The conversations continued for 10 minutes until terminated by another bleep signal.

During the digitization process, some periods of silence were removed, so some recordings now appear to be shorter than the original ten minutes. Furthermore, this means that recordings of two sides of a conversation are no longer time-aligned. In addition, to preserve the anonymity of the speakers, some passages (mainly the introductions) have been erased by replacing with binary zeroes. Finally the bleep signals have also been erased with binary zeroes. The transcriptions indicate where this has occurred.

The speech was transcribed verbatim. No attempt was made to correct grammar, fill in missing words etc. Transcription conventions are detailed in the documentation. Every lexical word from the transcriptions is contained in the dictionary supplied in the INDEX directory. There are about 6,500 word types in the 600k words of the transcripts. Contractions, part-words, slang words, hesitation sounds and the non-speech sounds such are all treated as words in their own right in the dictionary.

Content Copyright

Portions © 1994 Trustees of the University of Pennsylvania


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact: ldc@ldc.upenn.edu

(c) 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.