Speech Communication Group
Artificial Intelligence Laboratory
NYNEX Science and Technology
500 Westchester Avenue
White Plains, NY 10604
Copyright 1990 NYNEX Corporation All Rights Reserved
The NTIMIT, or Network speech database is a telephone bandwidth version of the widely used TIMIT database. Some useful features of NTIMIT: (excluding the characteristics that make TIMIT useful)
Issues that are common to TIMIT and NTIMIT, such as organization of speech utterances, are discussed in the documentation files that are released with TIMIT, which are included with NTIMIT as well.
1. Distribution of TIMIT Utterances to Telephone Channels
In order to sample a variety of line conditions, TIMIT utterances were transmitted over a variety of telephone line conditions. This was accomplished by the use of a Line Test Unit, or LTU. This device physically exists in various telephone central offices throughout the NYNEX geographic region of coverage. (New York and New England, excluding Connecticut) The LTU is connected to the telephone network and can be "called" just like a regular telephone. The LTU answers an incoming call, and is then programmed to call another specified telephone number, and finally connect the audio signals from the incoming call to the outgoing number.
During the NTIMIT transfer process, one telephone in the NYNEX Science and Technology Laboratories in White Plains called the input line of a given LTU and another telephone was prepared to receive a call. The LTU was then programmed to call the receiving telephone. After this telephone in the laboratory answered, an audio path was open from one telephone at NYNEX to the specified central office, and back to another telephone at NYNEX. Transmission of the TIMIT utterances then take place.
The telephone company has organized the United States into geographic entities called Local Access and Transport Areas, or LATAs. A telephone call that originates and terminates in the same LATA is handled solely by the local telephone company. (NYNEX, Bell Atlantic, etc...) A call that originates and terminates in different LATAs is handled first by the originating LATA's local telephone company. A long-distance carrier (AT&T, MCI, SPRINT, etc...) performs the inter-LATA transmission, then the terminating LATA's local company brings the call to the called telephone.
A listing of all the LTUs used in NTIMIT, along with their geographic location, LATA number (assigned by the telephone company), and "bin" number (our own numbering scheme of the different LATAs) can be found in the on-line file ltu-list, described below in "Documentation."
10 LATAs in the NYNEX region were used for the NTIMIT transfer process. One-half of the calls went to LTUs in the New York Metropolitan LATA, which is "local" for the NYNEX laboratories. The remaining one-half of the utterances were transmitted to LTUs in the other nine NYNEX LATAs. These were not "local" calls so a long-distance carrier was involved in the transmission. The following figure gives an example of how the TIMIT utterances were distributed to the LATAs:
SPEAKER | "BIN" (LATA) ------------------------------------- | 1(local) 2 3 4 5 6 7 8 9 10 ---------------------------------------------- 1 | 01234 5 6 7 8 9 2 | 90123 4 5 6 7 8 3 | 89012 3 4 5 6 7 4 | 78901 2 3 4 5 6 5 | 67890 1 2 3 4 5 6 | 56789 4 0 1 2 3 7 | 45678 2 3 9 0 1 8 | 34567 0 1 2 8 9 9 | 23456 8 9 0 1 7 10 | 12345 6 7 8 9 0 11 | 01234 5 6 7 8 9 ... | ...For example, speaker 8's second utterance was transmitted to an LTU in LATA 4.
After the utterances were distributed to LATAs, they were randomly distributed to central offices (LTUs) within that LATA.
The algorithm ensured the gender and dialect distribution of utterances of each LATA was approximately equal to the distributions in the entire database, since there are 630 speakers in the TIMIT database.
2. Calibration Tones
The utterance organization described above attempted to somehow "control" the line condition, by varying the geographic distance in the transmission path. For many reasons, though (see hardcopy documentation), the correlation between geographic distance and line quality is not necessarily high.
Because of this characteristic of the telephone network, it was decided to transmit "calibration" signals to every LTU, along with the transmission of TIMIT utterances. These signals allow some determination of the line conditions seen.
A mapping from utterance to LTU and BIN can be found in the on-line file, "utt_ndx.txt", described in "Documentation" below.
The two calibration tones are as follows:
The pathname of a calibration tone is as follows:
XX is the BIN number
YYYY is the LTU number
FILE is one either:
long.wav for the long tone
sweep.wav for the sweep tone
In analyzing the initial NTIMIT data, it was found that many utterances had a prominent signal energy between 30 and 50 Hz. This energy was visible even in a waveform display. Extensive analysis of the transmission configuration failed to pinpoint the source of the noise, although it is not believed to be an inherent property of the telephone network. Therefore, it was decided to high-pass-filter all of the NTIMIT data. The specifications of the filter are as follows:
Filter type: 1501 tap finite impulse response (FIR) Stopband: 0 - 70 Hz Transition band: 70 - 100 Hz Stopband rejection: > 58 dB Passband ripple: < .05 dBThis is similar to a filter used to filter the original TIMIT data after it was recorded.
4. Sample Utterances
This is a list of some NTIMIT utterances that have particularly bad signal characteristics. These utterances were selected during listening of the database. This does not imply that these utterances are the "worst case" of a particular signal degradation, they are included merely for illustrative purposes.
ntimit/train/dr4/mmdm0/si681.wav: Considerable broadband noise ntimit/test/dr5/mrjm3/sx150.wav: Considerable bandlimiting ntimit/test/dr5/mrjm3/si1448.wav: Shot noise (with low frequency hum) ntimit/train/dr5/fsdc0/si2234.wav: Low frequency hum ntimit/train/dr5/mhit0/sx173.wav: Crosstalk at end of utterance ntimit/train/dr4/fjwb1/si795.wav: Sharp pulses at beginning of utterance ntimit/train/dr7/mrlj1/sx51.wav: Dial pulses ntimit/test/dr4/mteb0/sa2.wav: DTMF ("touch tone") tones at end of utterance5. Documentation
This is a description of other documentation relevant to the NTIMIT database. Some of this documentation was initially compiled for the TIMIT database, but is included with NTIMIT.
"The NTIMIT Speech Database"  -- [included in booklet with CD-ROM]
Includes information found in this file, as well as more detail concerning the creation of NTIMIT. Additional issues discussed include:
A list of all the LTUs used for the creation of NTIMIT.
Column 1: LTU number, assigned by the telephone company.
Column 2: Geographic location of central office in which LTU is located
Column 3: LATA number of central office, assigned by telephone company
Column 4: "Bin" number of central office, assigned by speech group
utt_ndx.txt -- on-line
An index of all 6300 TIMIT utterances, and the BIN and LTU that they were transmitted to.
Column 1: TIMIT utterance
Column 2: BIN number of central office
Column 3: LTU number of central office
[See the file, "/ntimit/readme.doc" for a listing of TIMIT-related documentation. This file also contains a full listing of all of the documentation included with these CD-ROMs.]
1. William M. Fisher, George R. Doddington, and Kathleen M. Goudie-Marshall, "The DARPA Speech Recognition Research Database: Specifications and Status," Proceedings of DARPA Workshop on Speech Recognition, pp. 93-99, Feb. 1986.
[See the file, "/ntimit/readme.doc" for a listing of TIMIT-related documentation]
2. Charles Jankowski, Ashok Kalyanswamy, Sara Basson, and Judith Spitz, "NTIMIT: A Phonetically Balanced, Continuous Speech, Telephone Bandwidth Speech Database," Proceedings of ICASSP-90, April 1990.
3. Charles Jankowski, "The NTIMIT Speech Database", printed documentation which accompanies the NTIMIT CD-ROM, January 1991.