Introduction
TORGO Database of Dysarthric Articulation was developed by the University
of Toronto's departments of Computer
Science and Speech Language
Pathology in collaboration with the Holland-Bloorview
Kids Rehabilitation Hospital in Toronto, Canada. It contains approximately
23 hours of English speech data, accompanying transcripts and documentation
from 8 speakers (5 males, 3 females) with cerebral palsy (CP) or amyotrophic
lateral sclerosis (ALS) and from 7 speakers (4 males, 3 females) from a non-dysarthric
control group.
CP and ALS are examples of dysarthria which is caused by disruptions in the
neuro-motor interface that distort motor commands to the vocal articulators,
resulting in atypical and relatively unintelligible speech in most cases. The
TORGO database is primarily a resource for developing advanced automatic speaker
recognition (ASR) models suited to the needs of people with dysarthria, but it
is also applicable to non-dysarthric speech. The inability of modern ASR to
effectively understand dysarthric speech is a problem since the more general
physical disabilities often associated with the condition can make other forms
of computer input, such as computer keyboards or touch screens, difficult to
use.
Data
The data consists of aligned acoustics and measured 3D articulatory features
from the speakers carried out using the 3D
AG500 electro-magnetic articulograph (EMA) system (Carstens Medizinelektronik
GmbH, Lenglern, Germany) with fully-automated calibration. This system allows
for 3D recordings of articulatory movements inside and outside the vocal tract,
thus providing a detailed window on the nature and direction of speech-related
activity.
The data was collected between 2008 and 2010 in Toronto, Canada. All subjects
read text consisting of non-words, short words and restricted sentences from
a 19-inch LCD screen. The restricted sentences included 162 sentences from the
sentence intelligibility section of Assessment of intelligibility of dysarthric
speech (Yorkston & Beukelman, 1981) and 460 sentences derived from
the TIMIT
database. The unrestricted sentences were elicited by asking participants to
spontaneously describe 30 images in interesting situations taken randomly from
Webber Photo Cards - Story Starters (Webber, 2005), designed to prompt
students to tell or write a story.
Data is organized by speaker and by the session in which each speaker recorded
data. Each speaker was assigned a code and given their own file directory. The
code for female speakers begins with 'F', and the code for male speakers begins
with 'M'. If the speaker was a member of the control group, the letter 'C' follows
the gender code. The last two digits of the code indicate the order in which
that subject was recruited. For example, speaker 'FC02' was the second female
speaker without dysarthria recruited. Note that some speakers were intentionally
left out of the data, and thus, there are gaps in the numbering.
Each speaker's directory contains 'Session' directories which encapsulate data
recorded in the respective visit and occasionally, a 'Notes' directory which
can include Frenchay assessments (test for the measurement, description and
diagnosis of dysarthria), notes about sessions (e.g., sensor errors), and other
relevant notes.
Each 'Session' directory can, but does not necessarily, contain the following content:
- alignment.txt: This is a text file containing the sample offsets between audio files
recorded simultaneously by the array microphone and the head-worn
microphone.
- amps: These directories contain raw *.amp and *.ini files produced by the AG500
articulograph.
- phn_*: These directories contain phonemic transcriptions of audio data. Each file
is plain text with a *.PHN file extensions and a filename referring to the
utterance number. These files were generated using the free
Wavesurfer tool.
- pos: These directories contain the head-corrected positions, velocities,
and orientations of sensor coils for each utterance, as generated by the AG500
articulograph.
- prompts: These directories contain orthographic transcriptions.
- rawpos: These directories are equivalent to the pos directories except that their
articulographic content is not head-normalized to a constant upright
position.
- wav_*: These directories contain the acoustics. Each file is a RIFF (little-endian)
WAVE audio file (Microsoft PCM, 16 bit, mono 16000 Hz).
- wavall: These directories contains a stereo recording in which one
channel contains the recorded acoustics and the other channel contains the
analog peaks associated with the 'sweep' signal, which is used by the AG500
hardware for synchronization.
Additionally, sessions recorded with the AG500 articulograph are marked with
the file 'EMA', and those recorded with the video-based system are marked with
the file 'VIDEO'. Files with a date form as the filename and a txt extension
(e.g. april232008cal2.txt, jan28cal3.txt) are
the measured responses from calibration. The *.log and *.calset files contain descriptions
of the calibration process, but not the final result of calibration.
See the readme file and the
AG500 Wiki for more complete descriptions of the
possible subfolders and of the AG500 specific files.
Also, see session_contents.tsv for a
tab separated table of each session's subfolders and metadata files.
Samples
For an example of the data contained in this corpus, review these two audio samples: Dysarthric & Control.
Updates
None at this time.
Content Copyright
Portions © 2008-2011 Frank Rudzicz, © 2012 Trustees of the University
of Pennsylvania
|