NIST Evaluations

Since its founding in 1992, LDC has worked with the National Institute of Standards and Technology (NIST) on a series of ongoing human language technology evaluations.

NIST’s Multimodal Information Group researches and develops measurement and evaluation methods to advance and promote technologies that recognize and/or transform information in speech, text, video and other modalities through speech recognition, speaker recognition, language recognition, machine translation and visual recognition.

The Information Access Division of NIST’s Information Technology Laboratory supports technologies in accessing unstructured, digital, multimedia and other complex information including text, web pages, images, video, voice, audio and graphics. LDC has worked with both of these groups to develop and administer evaluations in the following areas:

AQUAINT (Advanced Question Answering for Intelligence)

The AQUAINT program seeks to solve the problem of finding topically relevant, semantically related, timely information in massive amounts of data in diverse languages, formats, and genres. AQUAINT technology allows users to pose a series of intertwined, complex questions and obtain comprehensive answers in the context of broad information-gathering tasks. LDC has supported this task through the development and distribution of English newswire data sets. The AQUAINT text collections are available through the LDC Catalog

ACE (Automatic Content Extraction)

The objective of the ACE program is to develop automatic content extraction technology to support automatic processing of human language in text form from a variety of sources (such as newswire, broadcast conversation and weblogs). LDC developed and distributed annotated Arabic, Chinese, English and Spanish broadcast transcripts, newswire and web text through the 2008 evaluation.

The ACE program became a track in the Text Analysis Conference (TAC) program in 2009. Various corpora developed by LDC for ACE evaluations are available through the LDC Catalog.

Language Recognition

The goal of the NIST Language Recognition Evaluation (LRE) series is to establish the baseline of current performance capability for language recognition of conversational telephone speech and to lay the groundwork for further research efforts in the field. LDC supports ongoing LRE through the collection and distribution of multilingual, multi-channel conversational telephone speech, other microphone speech and broadcast audio, including material from the CALLHOME, CALLFRIEND, Fisher and Mixer collections and Voice of America broadcasts. The 2011 LRE included a particular focus on linguistic varieties that may be especially difficult to distinguish from one another, including multiple dialects of a single language.

Training, test and supplemental data from those evaluations are available through the LDC Catalog.

Machine Translation

The machine translation (MT) program includes several activities contributing to machine translation technology and metrology advancements, primarily through systematic and targeted annual evaluations. Since 2002, LDC has supported the coordinated evaluations of text-to-text MT technology through the OpenMT series by providing multilingual texts and reference translations in various domains including newswire, broadcast and web text. Linguistic resources used by OpenMT were partially developed by LDC within the DARPA GALE and TIDES projects.

LDC also provides texts and translations and distribution support for the Metrics for Machine Translation Challenge (MetricsMATR), designed to research and promote innovative techniques that advance the measurement sciences used in MT evaluations. Corpora containing the source data, reference translations and scoring software from those evaluations are available through the LDC Catalog.

OpenHaRT (Open Handwriting Recognition and Translation Evaluation)

OpenHaRT is an evaluation of transcription and translation technologies for document images containing primarily Arabic script. The evaluation seeks to break new ground in the areas of document image recognition and translation toward the goal of document understanding capabilities. LDC develops linguistic resources to support OpenHaRT by collecting and annotating naturally-occurring examples of handwriting in multiple languages, genres and domains, and by sharing linguistic resources developed in the DARPA MADCAT program.

Rich Transcription Evaluation

The Rich Transcription (RT) evaluation series promotes and gauges advances in the state-of-the-art in several automatic speech recognition technologies. The goal of the series is to create recognition technologies that will produce transcriptions which are more readable by humans and more useful for machines. As such, a set of research tasks has been defined which are broadly categorized as either Speech-to-Text Transcription tasks or Metadata Extraction tasks.

LDC provided conversation telephone speech, broadcast audio and reference transcripts in support of the evaluation tasks. Corpora from the 2003, 2004 and 2005 RT evaluations are available through the LDC Catalog.

SCIL (Socio-cultural Content in Languages)

The SCIL Program was designed to explore and develop novel designs, algorithms, methods, techniques and technologies to extend the discovery of the social goals of members of a group by correlating these goals with the language they use. LDC provided Arabic, Chinese and English audio, text and annotations (transcriptions, translations, word alignment) to this program.

Speaker Recognition

The goal of the NIST Speaker Recognition Evaluation (SRE) series is to contribute to the direction of research efforts and the calibration of technical capabilities of text independent speaker recognition. The overarching objective of the evaluations has always been to drive the technology forward, to measure the state-of-the-art and to find the most promising algorithmic approaches.

LDC has supported those efforts since the first SRE evaluation in 1996 through the collection and distribution of multilingual, multi-channel conversational telephone speech and other microphone speech, including material from the CALLHOME, CALLFRIEND, Fisher and Mixer collections. Training, test and supplemental data from those evaluations are available through the LDC Catalog.

STD (Spoken Term Detection)

The goal of the STD evaluation is to facilitate research and development of technology for finding short word sequences rapidly and accurately in large heterogeneous audio archives. In support of the 2006 evaluation, LDC provided Arabic, Chinese and English broadcast news recordings from its broadcast programming collection as well as English conversational telephone speech from the Mixer and Fisher collections. That evaluation also included English meeting speech. The 2006 STD development and evaluation data is available through the LDC Catalog.

TAC (Text Analysis Conference)

TAC is a series of evaluations and workshops organized to encourage research in natural language processing and related applications. Tracks within the program include Knowledge Base Population (research in automated systems that discover information about named entities in a large corpus and incorporate that information into a knowledge base), Recognizing Textual Entailment (to develop systems that recognize when one piece of text entails another) and Summarization (to develop systems that produce coherent summaries of text). LDC supports those tasks by developing and distributing data sets of processed and annotated texts, for instance, in the TAC KBP series. 

TRECVid (TREC Video Retrieval Evaluation)

The TRECVid information retrieval research evaluation focuses on automatic segmentation, indexing and content-based retrieval of digital video. LDC supported the 2004-2006 TRECVid evaluations by collecting and distributing Arabic, Chinese and English broadcast programming from various sources. Keyframe corpora from those evaluations are available through the LDC Catalog. The goal of the TRECVid MED track is to assemble core detection technologies into a system that can search multimedia recordings for user-defined events based on pre-computed metadata.  The goal of the TRECVid MER track is to help MED systems not only detect events in video clips but to also recount the evidences used to identify the event. Data from the HAVIC Corpus developed by LDC was used to support MED evaluations in 2010 through 2013 and to support MER evaluaitons in 2012 and 2013.

Video Analysis and Content Extraction (VACE) 

The VACE program was established to develop novel algorithms for automatic video content extraction, multi-modal fusion, and event understanding. The evaluation has focused on the automated detection and tracking of moving objects including faces, hands, people, vehicles and text in four primary video domains: broadcast news, meetings, street surveillance and unmanned aerial vehicle motion imagery.

LDC supported the 2005 and 2006 technology evaluations by collecting and distributing English broadcast news programming. The meeting speech and broadcast data used in the 2005 and 2006 evaluations are available through the LDC Catalog.