LRE

Language Recognition Evaluation (LRE)

The purpose of the NIST Language Recognition Evaluation (LRE) series is to develop advanced technology in the field of language recognition and to measure the performance of language recognition systems. LDC supports LRE by collecting, annotating and distributing multilingual speech data.

The 2011 evaluation included a particular focus on linguistic varieties that may be especially difficult to distinguish from one another, including multiple dialects of a single language. In support of that effort, LDC collected approximately 400 telephone or narrowband broadcast segments in each of 24 languages. Confusable language clusters included:

  • Arabic: Iraqi, Levantine, Maghrebi, Modern Standard
  • English: American, Indian
  • Russian, Ukranian, Czech, Slovak
  • Thai, Lao
  • Hindi, Urdu
  • Dari, Farsi

Other languages collected for LRE11 included:

  • Polish
  • Turkish
  • Tamil
  • Spanish
  • Mandarin Chinese
  • Bengali
  • Punjabi
  • Pashto

Collection Methods

LDC collected recordings in two primary genres: conversational telephone speech (CTS) and broadcast narrowband speech (BNBS).

  • CTS - This method relied on a modest number of recruited callers for each language to make single calls to friends, families or acquaintances. Each recruited caller was given an incentive to make a single call to another speaker of their language either in the United States or abroad. The CTS collection infrastructure made use of a dedicated T-1 line, which provided 24 audio channels with toll-free service enabled.
  • BNBS - LDC collected narrowband segments embedded in broadcast recordings primarily from listener call-ins and phone interviews. That programming was collected by LDC's automated broadcast system that has access to  satellite, cable and terrestrial TV broadcasts.

Auditing

All collected speech segments were audited to determine that the segments were in the target language. Dual annotation, where segments were judged by more than one auditor, was particularly important in this collection given the focus on confusable linguistic varieties.

Additional Information

The 2011 NIST Language Recognition Evaluation Website (LRE11)

Stephanie Strassel, Kevin Walker, Karen Jones, Dave Graff, Christopher Cieri
New Resources for Recognition of Confusable Linguistic Varieties: The LRE11 Corpus
Odyssey 2012: The Speaker and Language Recognition Workshop, Singapore, June 25-28
Available: Paper in PDFSlides in PDF