NIST Evaluations

Since its founding in 1992, LDC has worked with the National Institute of Standards and Technology (NIST) [1] on a series of ongoing human language technology evaluations. LDC partners with NIST’s Multimodal Information Group [2] and Retrieval Group [3] to provide training, development and test data for research areas that include speech recognition, language recognition, machine translation, cross-language retrieval and multimedia retrieval.

This collaboration includes evaluations in the following programs and areas:

AQUAINT (Advanced Question Answering for Intelligence) [4]

The AQUAINT program sought to solve the problem of finding topically relevant, semantically related, timely information in massive amounts of data in diverse languages, formats, and genres. AQUAINT technology allows users to pose a series of intertwined, complex questions and obtain comprehensive answers in the context of broad information-gathering tasks. LDC supported this task through the development and distribution of English newswire data sets. The AQUAINT text collections are available through the LDC Catalog [5].

ACE (Automatic Content Extraction) [6]

The objective of the ACE program was to develop automatic content extraction technology to support automatic processing of human language in text form from a variety of sources (such as newswire, broadcast conversation and weblogs). LDC developed and distributed annotated Arabic, Chinese, English and Spanish broadcast transcripts, newswire and web text through the 2008 evaluation.

The ACE program became a track in the Text Analysis Conference (TAC) program in 2009. Various corpora developed by LDC for ACE evaluations are available through the LDC Catalog [5].

Language Recognition [7]

The goal of the NIST Language Recognition Evaluation (LRE) series is to establish the baseline of current performance capability for language recognition of conversational telephone speech and to lay the groundwork for further research efforts in the field. LDC supports ongoing LRE through the collection and distribution of multilingual, multi-channel conversational telephone speech, other microphone speech and broadcast audio, including material from the CALLHOME, CALLFRIEND, Fisher and Mixer collections and Voice of America broadcasts.

Training, test and supplemental data from those evaluations are available through the LDC Catalog [5].

LoReHLT (Low Resource Human Language Technologies) [8]

NIST's Low Resource Human Language Technologies (LoReHLT) evaluation series aims to advance HLT that can provide rapid and effective response to emerging incidents where the language resources are very limited. LDC has supported the 2016 and 2017 evaluations by providing multilingual language resources developed by LDC for the DARPA LORELEI [9] program.

Machine Translation [10]

The machine translation (MT) program includes several activities contributing to machine translation technology and metrology advancements, primarily through systematic and targeted annual evaluations. Since 2002, LDC has supported the coordinated evaluations of text-to-text MT technology through the OpenMT series by providing multilingual texts and reference translations in various domains including newswire, broadcast and web text. Linguistic resources used by OpenMT were partially developed by LDC within the DARPA GALE and TIDES projects.

LDC also provides texts and translations and distribution support for the Metrics for Machine Translation Challenge (MetricsMATR), designed to research and promote innovative techniques that advance the measurement sciences used in MT evaluations. Corpora containing the source data, reference translations and scoring software from those evaluations are available through the LDC Catalog [5].

OpenHaRT (Open Handwriting Recognition and Translation Evaluation) [11]

OpenHaRT was an evaluation of transcription and translation technologies for document images containing primarily Arabic script. The evaluation sought to break new ground in the areas of document image recognition and translation toward the goal of document understanding capabilities. LDC develops linguistic resources to support OpenHaRT by collecting and annotating naturally-occurring examples of handwriting in multiple languages, genres and domains, and by sharing linguistic resources developed in the DARPA MADCAT [12] program.

Open Keyword Search (OpenKWS) [13]

The Open Keyword Search Evaluation (OpenKWS) series is an extension of the 2006 Spoken Term Detection evaluation. Its objective is to support research in, and help advance the state of the art of, keyword search technologies, that is, technologies that locate a specified, potentially multi-word keyword in the specified language. The goal of the program is to reduce the difficulty of building high-performing KWS systems on a new language quickly with diminishing data resources and time. LDC has supported this task by providing to participants English conversational telephone speech and transcripts.

OpenSAT

The NIST Open Speech Analytics Technologies (OpenSAT) evaluation series was designed for researchers developing technologies to address speech analytic challenges in difficult acoustic conditions. It included the following tasks: keyword search, speech activity detection, automatic speech recognition and speaker diarization. LDC supported the OpenSAT 2017 pilot evaluation and the 2019 and 2020 evaluations by developing and distributing resources in various domains representing a range of acoustic conditions.

Rich Transcription Evaluation [14]

The Rich Transcription (RT) evaluation series promoted and gauged advances in the state-of-the-art in several automatic speech recognition technologies. The goal of the series was to create recognition technologies that would produce transcriptions that were more readable by humans and more useful for machines. As such, a set of research tasks were defined, broadly categorized as either Speech-to-Text Transcription tasks or Metadata Extraction tasks.

LDC provided conversational telephone speech, broadcast audio and reference transcripts in support of the evaluation tasks. Corpora from the 2003, 2004 and 2005 RT evaluations are available through the LDC Catalog [5].

SCIL (Socio-cultural Content in Languages)

The SCIL Program was designed to explore and develop novel designs, algorithms, methods, techniques and technologies to extend the discovery of the social goals of members of a group by correlating these goals with the language they use. LDC provided Arabic, Chinese and English audio, text and annotations (transcriptions, translations, word alignment) to this program.

Speaker Recognition [15]

The goal of the NIST Speaker Recognition Evaluation (SRE) series is to contribute to the direction of research efforts and the calibration of technical capabilities of text independent speaker recognition. The overarching objective of the evaluations has always been to drive the technology forward, to measure the state-of-the-art and to find the most promising algorithmic approaches.

LDC has supported those efforts since the first SRE evaluation in 1996 through the collection and distribution of multilingual, multi-channel conversational telephone speech and other microphone speech, including material from the CALLHOME, CALLFRIEND, Fisher and Mixer collections. Training, test and supplemental data from those evaluations are available through the LDC Catalog [5].

STD (Spoken Term Detection)

The goal of the STD evaluation is to facilitate research and development of technology for finding short word sequences rapidly and accurately in large heterogeneous audio archives. In support of the 2006 evaluation, LDC provided Arabic, Chinese and English broadcast news recordings from its broadcast programming collection as well as English conversational telephone speech from the Mixer and Fisher collections. That evaluation also included English meeting speech. The 2006 STD development and evaluation data is available through the LDC Catalog [5].

TAC (Text Analysis Conference) [16]

TAC is a series of evaluations and workshops organized to encourage research in natural language processing and related applications. Tracks within the program include Knowledge Base Population (research in automated systems that discover information about named entities in a large corpus and incorporate that information into a knowledge base), Recognizing Textual Entailment (to develop systems that recognize when one piece of text entails another) and Summarization (to develop systems that produce coherent summaries of text). LDC supports those tasks by developing and distributing data sets of processed and annotated texts, for instance, in the TAC KBP [17] series.

TRECVid (TREC Video Retrieval Evaluation) [18]

The TRECVid information retrieval research evaluation focuses on automatic segmentation, indexing and content-based retrieval of digital video. LDC supported the 2004-2006 TRECVid evaluations by collecting and distributing Arabic, Chinese and English broadcast programming from various sources. Keyframe corpora from those evaluations are available through the LDC Catalog [5]. The goal of the TRECVid MED track was to assemble core detection technologies into a system that can search multimedia recordings for user-defined events based on pre-computed metadata. The goal of the TRECVid MERtrack was to help MED systems not only detect events in video clips but to also recount the evidences used to identify the event. Data from the HAVIC Corpus developed by LDC was used to support MED and MER evaluations through 2016.

Video Analysis and Content Extraction (VACE)

The VACE program was established to develop novel algorithms for automatic video content extraction, multi-modal fusion, and event understanding. The evaluation focused on the automated detection and tracking of moving objects including faces, hands, people, vehicles and text in four primary video domains: broadcast news, meetings, street surveillance and unmanned aerial vehicle motion imagery.

LDC supported the 2005 and 2006 technology evaluations by collecting and distributing English broadcast news programming. The meeting speech and broadcast data used in the 2005 and 2006 evaluations are available through the LDC Catalog [5].