BOLT Data Collection
BOLT developed technology that enables English speakers to retrieve and understand information from informal foreign language sources including chat, text messaging and spoken conversations. The genres of interest to BOLT were characterized by inherent variation and inconsistency, motivating the development of new collection and annotation methods.
In BOLT’s first phase LDC collected large volumes of online discussion forum data in multiple languages, using techniques adapted from prior collection efforts.
For BOLT Phase two LDC collected naturally occurring informal text (SMS) and chat messages from individual users in multiple languages.
BOLT’s third phase focused on telephone speech, which was supplied from LDC's multilingual CALLHOME and CALLFRIEND collections.