Introduction
The Japanese Electronic Industry Development Association's (JEIDA)
Common Speech Data Corpus (JCSD) was prepared by Jonathan Hamaker,
Richard J. Duncan and Joe Picone of the Institute for Signal and
Information Processing at Mississippi State University.
Data
This collection consists of high-fidelity recordings of 150 native
speakers of Japanese; each speaker produces four repetitions of 323
short prompts, including city names, control words, monosyllabic
words, isolated digits and strings of four digits. Each reading
session was recorded with two microphones, yielding two channels that
differ in audio quality for each utterance. Channel 0 (LDC96S64)
contains data recorded with a standard dynamic microphone---a Sanken MU-2C
microphone. Channel 1 (LDC96S65) contains data
recorded
simultaneously with a condenser microphone that presumably varied from
site to site and is available separately.
A summary of the size and content of the corpus is given below:
number of speakers 150 speakers
males 75
females 75
range of speaker age 10 yrs. to 70 yrs.
number of items per speaker 323 items
isolated digits 15
four digit sequences 35
city names 100
monosyllables 110
control words (set A) 13
control words (set B) 24
control words (set C) 26
number of repetitions per item 4 repetitions
total number of utterances 193,763 utterances (per channel)
sample frequency 16 kHz
sample type 16-bit linear
number of microphones 2 (dynamic and condenser)
For purposes of publication by the LDC, the corpus has been organized
onto 40 CD-ROMs; the partitioning of the data files have been done
primarily by channel (20 CD-ROMs each for channel 0 and channel 1)
and secondarily by category of prompts. These prompts include:
Description Number of items
Control Words:
Banking Services 13
Word Processors 24
Home Electronic Equipment 26
Digits:
Isolated Digits 15
Four Digit Sequences 35
City Names: 100
a phonetically-rich subset
of common Japanese city names
Monosyllables: 110
all Japanese monosyllables plus
several used to pronounce
foreign words
JEIDA/JCSD-Channel 0 and JEIDA/JCSD-Channel
1 can each be ordered as complete sets. Components of the corpus
can also be purchased as outlined below:
Price Set-of Description Catalog ID
2000 20 JEIDA/JCSD-Channel 0 (Complete) LDC96S64
600 6 JEIDA/JCSD-Channel 0 City Names LDC96S64-1
400 4 JEIDA/JCSD-Channel 0 Control Words LDC96S64-2
100 1 JEIDA/JCSD-Channel 0 Isolated Digits LDC96S64-3
300 3 JEIDA/JCSD-Channel 0 Four Digit Seq. LDC96S64-4
600 6 JEIDA/JCSD-Channel 0 Monosyllables LDC96S64-5
2000 5 JEIDA/JCSD-Channel 1 (Complete) LDC96S65
600 1 JEIDA/JCSD-Channel 1 City Names LDC96S65-1
500 1 JEIDA/JCSD-Channel 1 Control Words LDC96S65-2
100 1 JEIDA/JCSD-Channel 1 Isolated Digits LDC96S65-3
300 1 JEIDA/JCSD-Channel 1 Four Digit Seq. LDC96S65-4
600 1 JEIDA/JCSD-Channel 1 Monosyllables LDC96S65-5
Updates
There are no updates at this time.
Content Copyright |