Introduction
The CALLHOME Spanish Dialogue Act Annotation Corpus, Linguistic Data Consortium
(LDC) catalog number LDC2001T61 and ISBN 1-58563-197-3, was developed under
Project CLARITY. The goal of CLARITY was to glean discourse information from
unrestricted conversational speech using shallow, corpus-based analysis. The
annotation was carried out at Interactive Systems Labs at Carnegie Mellon
University.
Data
This publication used a three-level coding scheme to manually tag the LDC CALLHOME Spanish Transcripts. The three
levels of the coding scheme are:
- a dialogue act level consisting of a tag set extended from DAMSL and
Switchboard;
- a dialogue game level featuring short sequences of dialogue
acts
- a genre level similiar to topical segments. All available (120)
dialogues have been annotated.
Dialogue games are short sequences of dialogue acts such as question/answer
pairs. Genres can be storytelling, discussion, planning, etc. Segmentation
takes topics into account as well. Genres, games, and dialogue acts are
annotated by type. Genres are additionally annotated for activities and topics
(on a 0-5 scale), for the central object or person being discussed (who or what
category), and contain a short synopsis of the segment.
All available 120 CALLHOME Spanish dialogues have been annotated. The dialogue
act annotation scheme is a further development of the SwitchBoard DAMSL
tagset. Dialogue games are short sequences of dialogue acts such as
question/answer pairs. Genres can be storytelling, discussion, planning
etc. and the segmentation takes topic into account as well. Genres, games and
dialogue acts are annotated for their type. Genres are additionally annotated
for activities and topics (on a 0-5 scale), for the central object or person
being discussed (who or what category) and contain a short gist of the segment.
An example of the tagging from one conversation is presented below.
Sm,
eso es para
eso, de seguro.
No importa.
No importa.
Bueno
aqum, la Zaida esta estudiando tambiin en la universidad con la Liana.
Y
qui estudia, mama,
qui
estan estudiando.
[background speech]
Estan estudiando
Sociales. Ciencias Sociales.
Ah,
para
maes- para maestra de Sociales.
Sm
Updates
There are no updates at this time. |