WAV
.wav files are speech sample files. They have an uncompressed ascii header followed by linear, 2 byte, signed integer samples that have been compressed using the "shorten" speech compression method developed by Tony Robinson at Cambridge University. The headers follow the NIST SPHERE header format (the document header.doc contains a more detailed explanation of this format). The first line specifies the header type and the second line specifies the header length. All of the files in this distribution have a header length of 1024 bytes. Below is a sample header:
NIST_1A 1024 sample_byte_format -s2 10 channel_count -i 1 sample_count -i 60111 sample_rate -i 8000 sample_n_bytes -i 2 sample_sig_bits -i 16 sample_max -i 2688 sample_min -i -3248 sample_coding -s26 pcm,embedded-shorten-v1.09 sample_checksum -i 65264 end_headAs per the NIST format definition, the bytes between the "end_head" line and the 1024th byte are undefined.
The "sphere" directory contains the latest release (v2.0Beta2.1) of the NIST SPHERE software package, which includes programs to read, modify and remove the SPHERE headers, and to uncompress the waveform data. These programs are supplied in the form of C source code, together with a UNIX shell script and "makefiles" to compile and install the programs. Please note that a more recent version of this package may be available via anonymous ftp directly from NIST. To obtain the package from NIST:
ftp jaguar.ncsl.nist.gov Name: anonymous Password: <your.email@address> cd pub dir sphere* binary get <latest_sphere_version.tar.Z> byeThe ftp command "dir sphere*" will list the lastest release available; supply this file name for the following "get" command. (MS-DOS users should note that the Free Software Foundation (FSF/GNU) provides DOS versions for the necessary UNIX environment utilities -- make, cc, tar, uncompress, etc -- to extract and compile the NIST sources on 80386 and 80486 machines.)
SEG
The .seg files are ascii "location and label" files. They are similar to the ".phn" files of the TIMIT database except
MillisecondsPerFrame: 3.0 END OF HEADERAfter that are a series of lines, one per segment, of the form
<begin frame> <end frame + 1> labelFor example
200 237 ah 237 289 mThe [ah] segment extends from frame 200 to frame 236 inclusive. The end label is 237 for "historical" reasons.
All the .seg files are in the "seglola" directory within the respective language directories. The total across all languages is 504. The breakup by language is:
English 55 Farsi 48 French 52 German 53 Japanese 47 Korean 47 Mandarin 51 Spanish 53 Tamil 46 Vietnamese 52 TOTAL 504