This set of data on Taiwanese accented Putonghua (PTH) was gathered by
San Duanmu at the University of Michigan. The data was recorded in
Taiwan from December 1994 to January 1995. Taiwanese accented PTH
refers to PTH spoken by people who were born in Taiwan and whose first
language is Taiwanese (Southern Min).
A total of 40 speakers; ranging in age, education, birth place
and family dialect; were recorded. There were five two-speaker dialogues
and 30 single-speaker monologues. The dialogues were about 20
minutes each and the monologues were about 10 minutes
each. Dialogues were recorded on two tracks, one for each
speaker. Monologues were recorded on one track.
The recordings were done in ordinary, but quiet rooms. The speakers
were asked in advance to speak in conversation style, without notes,
on any topic they chose, or no topic at all. Most speakers spoke
spontaneously and the topic drifted freely. Some speakers talked about
their professional work in a rather formal way. One speaker (#20, a
public health official) used notes. Overall, the corpus provides an
informative sampling of variation in speech style.
The recording tools consisted of a portable DAT (Teac) which recorded
at a 44.1 kHz sampling rate at 16 bits linear quantization. The microphones
were AudioTechnica lapel microphones with a preamp and XLR connection to
the DAT. The XLR helped low noise recordings and the AudioTechnica
provided wide bandwidth, flat response over the speech range of interest,
was unidirectional to minimize cross-talk and very light in comparison
with standard microphones. Both single-speaker monologues and two-speaker
dialogues were recorded using this system on standard DAT tape. For
publication on CD-ROM, the original DAT recordings were downsampled to
a 16 kHz sample rate.
Before recording, all speakers read and signed the "Informed Consent
Form," which was written in Chinese and which largely followed the standard
format approved by the Human Subject Committee of the University of
Michigan. The form stated that the participation in the recording was
entirely voluntary and that the speech may be used for linguistic teaching
and research purposes.
The speech data are accompanied by transcripts. The monologues have
start and end time stamps. The five dialogues are time stamped by
After the publication of this corpus some demographic data
was made available to the LDC. To access this data, please go to the demographic table.