This file contains documentation on CSLU: Foreign Accented English Release 1.2, Linguistic
Data Consortium (LDC) catalog number LDC2006S38 and isbn 1-58563-392-5.
CSLU: Foreign Accented English Release 1.2 consists of continuous speech in English
by native speakers of 22 different languages: Arabic, Cantonese, Czech, Farsi,
French, German, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Mandarin
Chinese, Malay, Polish, Portuguese (Brazilian and Iberian), Russian, Swedish,
Spanish, Swahili, Tamil and Vietnamese. The corpus contains 4925 telephone-quality
utterances, information about the speakers' linguistic backgrounds and perceptual
judgments about the accents in the utterances. The speakers were asked to speak
about themselves in English for 20 seconds. Three native speakers of American
English independently listened to each utterance and judged the speakers' accents
on a 4-point scale: negligible/no accent, mild accent, strong accent and very
strong accent. This corpus is intended to support the study of the underlying
characteristics of foreign accent and to enable research, development and evaluation
of algorithms for the identification and understanding of accented speech. Some
of the files in this corpus are also contained in CSLU:
22 Languages Corpus, LDC2005S26.
For an example of the data in this corpus, please listen to this audio sample.
Portions © 2000-2002 Center for Spoken Language Understanding, Oregon
Health & Science University, © 2007 Trustees of the University of Pennsylvania