Task Specifications

BOLT developed technology that enables English speakers to retrieve and understand information from informal foreign language sources including chat, text messaging and spoken conversations. The genres of interest to BOLT were characterized by inherent variation and inconsistency, motivating the development of new collection and annotation methods. 

HAVIC

Heterogeneous Audio Visual Internet Collection (HAVIC)

LDC built a large corpus of multi-modal data to support research in a variety of areas including spoken term detection and video event detection. The HAVIC (Heterogeneous Audio Visual Internet Collection) Corpus consists of thousands of hours of “real world” video data collected from the internet. The corpus especially targeted user-generated video content as opposed to professionally-produced or commercial video content.

Pages

Subscribe to Linguistic Data Consortium RSS