The DARPA BOLT (Broad Operational Language Translation) Program developed genre-independent machine translation and information retrieval systems. While earlier DARPA programs made significant strides in improving natural language processing capabilities in structured genres like newswire and broadcasts, BOLT was particularly concerned with improving translation and information retrieval performance for less-formal genres with a special focus on user-contributed content.
LDC supported the BOLT Program by collecting informal data sources -- discussion forums, text messaging and chat -- in Chinese, Egyptian Arabic and English. The collected data was translated and richly annotated for a variety of tasks including word alignment, Treebanking, PropBanking, and co-reference. LDC supported the evaluation of BOLT technologies by post-editing machine translation system output and assessing information retrieval system responses during annual evaluations conducted by NIST.