Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



TI 46-Word

Item Name: TI 46-Word
Authors: Mark Liberman, Robert Amsler, Ken Church, Ed Fox, Carole Hafner, Judy Klavans, Mitch Marcus, Bob Mercer, Jan Pedersen, Paul Roossin, Don Walker, Susan Warwick and Antonio Zampolli
LDC Catalog No.: LDC93S9
NIST Catalog No.: 7-1.1
ISBN: 1-58563-017-9
Data Type: speech
Sample Rate: 12500 Hz
Sampling Format: 1-channel 12-bit pcm
Data Source(s): microphone speech
Application(s): speech recognition
Language(s): English
Language ID(s): eng
Distribution: 1 CD
Member fee: $0 for 1993 members
Non-member Fee: US $300.00
Reduced-License Fee: US $150.00
Extra-Copy Fee: US $150.00
Non-member License: yes
Readme File: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: Mark Liberman, et al.
1993
TI 46-Word
Linguistic Data Consortium, Philadelphia

This CD-ROM contains a corpus of speech which was originally designed and collected at Texas Instruments, Inc. (TI) in 1980 and used initially in performance assessment tests of isolated-word speaker-dependent technology. (See "Speech Recognition: Turning Theory to Practice" by G. R. Doddington and T. B. Schalk, in IEEE Spectrum, Vol. 18, No. 9, September 1981.)

The 46-word vocabulary consists of two sub-vocabularies: (1) the TI 20-word vocabulary (consisting of the digits zero through nine plus the words "enter," "erase," "go," "help," "no," "rubout," "repeat," "stop," "start," and "yes" as well as (2) the TI 26-word "alphabet set" (consisting of the letters "a" through "z").

The corpus contains read utterances from 16 speakers (eight males and eight females) each speaking 26 utterances of the 46-word vocabulary: 16 tokens designated as training and ten as test.

The corpus was collected at Texas Instruments in a quiet acoustic enclosure using an Electro-Voice RE-16 Dynamic Cardiod microphone at 12.5kHz sample rate with 12-bit quantization. The files are in NIST SPHERE format and have a ".wav" filename extension.


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact: ldc@ldc.upenn.edu

(c) 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.