Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



CALLHOME Spanish Dialogue Act Annotation

Item Name: CALLHOME Spanish Dialogue Act Annotation
Authors: Alex Waibel, Alon Lavie, Lori Levin, Klaus Ries, and Liza Valle-Argueta
LDC Catalog No.: LDC2001T61
ISBN: 1-58563-197-3
Data Type: text
Data Source(s): telephone conversations
Application(s): speech recognition
Language(s): Spanish
Distribution: Web Download
Member fee: $0 for 2001 members
Non-member Fee: US$900.00
Reduced-License Fee: US$450.00
Extra-Copy Fee: N/A
Non-member License: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: Alex Waibel, et al.
2001
CALLHOME Spanish Dialogue Act Annotation
Linguistic Data Consortium, Philadelphia

Introduction

The CALLHOME Spanish Dialogue Act Annotation Corpus, Linguistic Data Consortium (LDC) catalog number LDC2001T61 and ISBN 1-58563-197-3, was developed under Project CLARITY. The goal of CLARITY was to glean discourse information from unrestricted conversational speech using shallow, corpus-based analysis. The annotation was carried out at Interactive Systems Labs at Carnegie Mellon University.

Data

This publication used a three-level coding scheme to manually tag the LDC CALLHOME Spanish Transcripts. The three levels of the coding scheme are:

  1. a dialogue act level consisting of a tag set extended from DAMSL and Switchboard;
  2. a dialogue game level featuring short sequences of dialogue acts
  3. a genre level similiar to topical segments. All available (120) dialogues have been annotated.

Dialogue games are short sequences of dialogue acts such as question/answer pairs. Genres can be storytelling, discussion, planning, etc. Segmentation takes topics into account as well. Genres, games, and dialogue acts are annotated by type. Genres are additionally annotated for activities and topics (on a 0-5 scale), for the central object or person being discussed (who or what category), and contain a short synopsis of the segment.

All available 120 CALLHOME Spanish dialogues have been annotated. The dialogue act annotation scheme is a further development of the SwitchBoard DAMSL tagset. Dialogue games are short sequences of dialogue acts such as question/answer pairs. Genres can be storytelling, discussion, planning etc. and the segmentation takes topic into account as well. Genres, games and dialogue acts are annotated for their type. Genres are additionally annotated for activities and topics (on a 0-5 scale), for the central object or person being discussed (who or what category) and contain a short gist of the segment.

An example of the tagging from one conversation is presented below.

    
    
    
    
     Sm, 
     eso es para
    eso, de seguro. 
     No importa. 
     No importa. 
     Bueno
    aqum, la Zaida esta estudiando tambiin en la universidad con la Liana. 
    
     Y
    qui estudia, mama, 
     qui
    estan estudiando. 
     [background speech] 
     Estan estudiando
    Sociales. Ciencias Sociales. 
     Ah, 
     para
    maes- para maestra de Sociales. 
     Sm 
    
    

Updates

There are no updates at this time.


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Da ta

Contact: ldc@ldc.upenn.edu

(c) 1992-2008 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.