Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



JEIDA/JCSD-Channel 1 Isolated Digits

Item Name: JEIDA/JCSD-Channel 1 Isolated Digits
Authors: Jonathan Hamaker, Richard J. Duncan, Joe Picone, and Shuichi Itahashi. The original data was provided by the Japan Electronic Industry Development Association (JEIDA)
LDC Catalog No.: LDC96S65-3
ISBN: 1-58563-102-7
Data Type: speech
Sample Rate: 16000 Hz
Sampling Format: 1-channel pcm
Data Source(s): microphone speech
Application(s): speech recognition
Language(s): Japanese
Language ID(s): JPN
Distribution: 1 CD
Member fee: $0 for 1996, 1997 members
Non-member Fee: US $100.00
Reduced-License Fee: US $100.00
Extra-Copy Fee: US $100.00
Non-member License: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: Jonathan Hamaker, et al.
1996
JEIDA/JCSD-Channel 1 Isolated Digits
Linguistic Data Consortium, Philadelphia

Introduction

The Japanese Electronic Industry Development Association's (JEIDA) Common Speech Data Corpus (JCSD) was prepared by Jonathan Hamaker, Richard J. Duncan and Joe Picone of the Institute for Signal and Information Processing at Mississippi State University.

Data

This collection consists of high-fidelity recordings of 150 native speakers of Japanese; each speaker produces four repetitions of 323 short prompts, including city names, control words, monosyllabic words, isolated digits and strings of four digits. Each reading session was recorded with two microphones, yielding two channels that differ in audio quality for each utterance. Channel 0 (LDC96S64) contains data recorded with a standard dynamic microphone---a Sanken MU-2C microphone. Channel 1 (LDC96S65) contains data recorded simultaneously with a condenser microphone that presumably varied from site to site and is available separately.

A summary of the size and content of the corpus is given below:

    number of speakers              150 speakers
            males                   75
	    females                 75
    range of speaker age            10 yrs. to 70 yrs.

    number of items per speaker     323 items
            isolated digits         15
	    four digit sequences    35
	    city names              100
	    monosyllables           110
	    control words (set A)   13
	    control words (set B)   24
	    control words (set C)   26

    number of repetitions per item  4 repetitions
    total number of utterances      193,763 utterances (per channel)

    sample frequency                16 kHz
    sample type                     16-bit linear
    number of microphones           2 (dynamic and condenser)
    

For purposes of publication by the LDC, the corpus has been organized onto 40 CD-ROMs; the partitioning of the data files have been done primarily by channel (20 CD-ROMs each for channel 0 and channel 1) and secondarily by category of prompts. These prompts include:

    Description                                     Number of items

    Control Words:
            Banking Services                        13
	    Word Processors                         24
	    Home Electronic Equipment               26

    Digits:
	    Isolated Digits                         15
	    Four Digit Sequences                    35

    City Names:                                     100
	     a phonetically-rich subset 
	     of common Japanese city names

    Monosyllables:                                  110
            all Japanese monosyllables plus 
	    several used to pronounce 
	    foreign words
    

JEIDA/JCSD-Channel 0 and JEIDA/JCSD-Channel 1 can each be ordered as complete sets. Components of the corpus can also be purchased as outlined below:

    Price   Set-of  Description				  Catalog ID 
 
    2000   20      JEIDA/JCSD-Channel 0 (Complete)        LDC96S64
    600    6      JEIDA/JCSD-Channel 0 City Names         LDC96S64-1
    400    4      JEIDA/JCSD-Channel 0 Control Words      LDC96S64-2
    100    1      JEIDA/JCSD-Channel 0 Isolated Digits    LDC96S64-3
    300    3      JEIDA/JCSD-Channel 0 Four Digit Seq.    LDC96S64-4
    600    6      JEIDA/JCSD-Channel 0 Monosyllables      LDC96S64-5

    2000   20      JEIDA/JCSD-Channel 1 (Complete)        LDC96S65
    600    6      JEIDA/JCSD-Channel 1 City Names         LDC96S65-1
    500    4      JEIDA/JCSD-Channel 1 Control Words      LDC96S65-2
    100    1      JEIDA/JCSD-Channel 1 Isolated Digits    LDC96S65-3
    300    3      JEIDA/JCSD-Channel 1 Four Digit Seq.    LDC96S65-4
    600    6      JEIDA/JCSD-Channel 1 Monosyllables      LDC96S65-5
    

Updates

There are no updates at this time.

Content Copyright


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact: ldc@ldc.upenn.edu

(c) 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.