This file contains documentation on the NIST 2003 Language Recognition Evaluation , Linguistic Data Consortium (LDC) catalog number LDC2006S31 and isbn 1-58563-364-X.
The NIST Language Recognition evaluation (LRE) is to establish the baseline of current performance capability for language recognition of conversational telephone speech and to lay the groundwork for further research efforts in the field. The series had its first evaluation in 1996. The 2003 NIST Language Recognition evaluation (LRE-03) is part of this ongoing series of evaluations of language recognition technology.
The task to be evaluated is the detection of a given target language. Given a test segment of speech, a target language will be assigned as a test hypothesis, and the task is to determine whether this test hypothesis is true or false.
Further information regarding this evaluation may be found on the 2003 NIST Language Recognition Evaluation website and the NIST 2003 evaluation plan.
Each speech file is one side of a "4 wire" telephone conversation represented as 8-bit, 8kHz mulaw data. There are total of 11,830 speech files in sphere(.sph) format for a total of around forty six hours of speech. The speech data was compiled from the CALLFRIEND, CALLHOME, and SWITCHBOARD-2 corpora. Each file contains one test segment. The test segments are divided into three second, ten second, and thirty second tests, each in its own directory.
Please go to data for a listing of data files.
Other documentation files are:
Additional information, updates, bug fixes may be available in the LDC catalog entry for this corpus at LDC2006S31.
Portions © 1996-2002, 2006 Trustees of the University of Pennsylvania