MADCAT | Linguistic Data Consortium

MADCAT

Multilingual Automatic Document Classification, Analysis and Translation (MADCAT)

The goal of the DARPA MADCAT Program was to automatically convert foreign text images into English transcripts. LDC supported MADCAT by collecting handwritten documents in Arabic and Chinese, scanning texts at a high resolution, annotating the physical coordinates of each line and token, and transcribing and translating the content into English. LDC also supported the evaluation of MADCAT technologies by post-editing machine translation system output during annual evaluations conducted by NIST.

Tasks

There were two evaluation tracks in MADCAT:

The Official Track evaluated system performance on GALE-style data where the images were created under a controlled environment (Phase 1 - Phase 5)
The Challenge Track used real-world operational data to test incremental challenges posed by real world data (Phase 2 - Phase 5)