BOLT Data Collection

BOLT developed technology that enables English speakers to retrieve and understand information from informal foreign language sources including chat, text messaging and spoken conversations. The genres of interest to BOLT were characterized by inherent variation and inconsistency, motivating the development of new collection and annotation methods. 

In BOLT’s first phase LDC collected large volumes of online discussion forum data in multiple languages, using techniques adapted from prior collection efforts. Click here to see details of Discussion Forum data.

For BOLT Phase two LDC collected naturally occurring informal text (SMS) and chat messages from individual users in multiple languages. Click here to see details of SMS and Chat data. 

BOLT’s third phase focused on telephone speech, which was supplied from LDC's multilingual CALLHOME and CALLFRIEND collections.