|

|
|
2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data
| |
| Item Name: | 2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data |
| Authors: | Jonathan Fiscus, John Garofolo, Audrey Le, Alvin Martin, Greg Sanders, Mark Przybocki, David Pallett |
| LDC Catalog No.: | LDC2007S12 |
| ISBN: | 1-58563-448-4 |
| Release Date: | Oct 17, 2007 |
| Data Type: | transcripts |
| Data Source(s): | meeting speech |
| Application(s): | automatic content extraction, discourse analysis, information retrieval, language modeling, speaker identification, speaker verification, speech recognition |
| Language(s): | English |
| Language ID(s): | eng |
| Distribution: | 1 DVD |
| Member fee: | $0 for 2007 members |
| Non-member Fee: | US$2000.00 |
| Reduced-License Fee: | US$1000.00 |
| Extra-Copy Fee: | US$200.00 |
| Non-member License: | yes |
| Online documentation: | yes |
| Licensing Instructions: | Subscription Members, Standard Members, Non-Members |
| Citation: | Jonathan Fiscus, et al. 2007 2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data Linguistic Data Consortium, Philadelphia |
|
Introduction
2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data contains the test
material (meeting speech and reference transcripts) used in the RT-04S evaluation
administered by the NIST (National Institute
of Standards and Technology) Speech Group. Rich Transcription (RT) is broadly
defined as a fusion of speech-to-text technology and metadata extraction technologies
designed to provide the basis for a generation of more usable transcriptions
of human-human meeting speech.
The data in this release consists of portions of meeting speech collected
and/or transcribed by the International Computer Science Institute (ICSI) at
Berkeley, the Interactive Systems Laboratories (ISL) at Carnegie Mellon University,
NIST and LDC. The complete meeting speech and corresponding transcript data
sets are available from LDC's catalog as follows: ICSI
Meeting Speech (LDC2004S02), ICSI
Meeting Transcripts (LDC2004T04), ISL
Meeting Speech Part 1 (LDC2004S05), ISL
Meeting Transcripts Part 1 (LDC2004T10), NIST
Meeting Pilot Corpus Speech (LDC2004S09) and NIST
Meeting Pilot Corpus Transcripts and Metadata (LDC2004T13).
RT-04S included the following tasks in the meeting domain:
- Speech-to-Text Transcription (STT) tasks
- Microphone conditions:
- Multiple distant microphones
- Single distant microphone
- Individual head microphone
- Processing time conditions:
- Unlimited time STT
- Less than or equal to twenty times realtime
- Less than or equal to ten times realtime
- Less than or equal to one times realtime
- Diarization (SPKR) task (?who spoke when?)
- Microphone conditions:
- Multiple distant microphones
- Single distant microphone
- Input conditions:
- Speech input only
- Speech plus reference transcript input
- Processing time conditions:
- Unlimited time
- Less than or equal to twenty times realtime
- Less than or equal to ten times realtime
- Less than or equal to one time realtime
- Futher information about the evaluation is available on the RT-04
Spring Evaluation Website.
Samples
For an example of the data in this corpus, please review this audio sample.
Content Copyright
Portions © 2003 Interactive Systems Laboratories, Carnegie Mellon University,
© 2000-2001 International Computer Science Institute, © 2001, 2004,
2007 Trustees of the University of Pennsylvania |
|
|