Gesture Annotation: Tools and Data


The empirical study of the role of gesture in natural language requires large sets of linguistic data, annotated with gestural descriptions. Our goal is for this page to become a centralized listing of tools and data which can aid in this empirical study. This work is supported, in part, by the Linguistic Data Consortium, The TalkBank project, and ISLE: International Standards in Language Engineering.

Please send updates to Craig Martell.


Index
Anvil CHILDES/CLAN FORM MacShapa Media Streams MediaTagger SignStream syncWRITER Transana VISLab

Key
C: an online corpus of textual data
V: an online corpus including video recordings
T: an available tool for creation, display or search
P: there is a citeable paper which documents the system
R: any other kind of resource such as a web page, a professional association, a new project, a classification system, etc


Note: the title of each section gives the primary hyperlink

Gesture Resources
TP Anvil: Annotation of Video and Language Data (Michael Kipp)
Anvil is a Java-based tool which permits multilayered annotation of video with gesture, posture, and discourse information. The tags used can be freely specified, and can easily be hierarchically arranged. Anvil requires the Java Media Framework, and should run on Solaris, Windows and (possibly) Linux. Video format is quicktime and AVI. Data storage and exchange is XML. It is freely available to the research community if you email the author.
C FORM: A Kinematic Gesture-Annotation Scheme (Craig Martell)
FORM is a gesture annotation scheme designed to capture the kinematic information in gesture from videos of speakers. The FORM project is currently building a detailed database of gesture-annotated videos stored in Annotation Graph format. This will allow the gestural information to be augmented with other linguistic information, like parse-tress of the sentences accompanying the gestures, discourse structure, intonation information, etc.

FORM encodes the "phonetics" of gesture by giving geometric descriptions of location and movement of the right and left arms and hands, the torso and the head. Other kinematic information like effort and shape are also recorded.

Currently, FORM uses Anvil, but they are developing FORMTool, their own open-sourced gesture annotation tool. This tool will be of use to anyone wanting to encode linguistic information from videos in Annotaiton Graph format.

CTP CHILDES/CLAN: Child Language Data Exchange System/Child Language Analysis (Brian MacWhinney)
The CHILDES/CLAN system is a suite of tools for studying conversational interactions--including those for coding and analyzing transcripts and a system for linking these transcripts to digitized audio and video. CLAN supports both CHAT and CA (Converstational Analysis) notation, with alignment of text with video at the phrase level. There is an extensive online tutorial, as well as downloadable documentation. CLAN is available for both Windows and Mac. [NSF Award, MacArthur Foundation Award, National Institutes of Health Award]
TP MacSHAPA (Penelope Sanderson)
[This project does not seem to be active, however Dr. Sanderson is happy to help users or inquirers "as resources allow."] According to Justine Cassell, MIT Media Labs, MacSHAPA is a flexible video-annotation tool which supports user-defined coding schemes, has a shared-time line for all events, and comes with good search and statistical routines. And, it allows the user to output the data directly into a spreadsheet or SPSS, so that quantatative analyses can be done immediately.

MacSHAPA controls either high-end Panasonic or JVC VCRs, or it (v1.1) can work with Quicktime files.

P Media Streams (Marc Davis)
[This project does not seem to be active. The following is derived from the projects web pages.] Media Streams allows users to create multi-layered, iconic annotations of video content. Using over 2500 iconic primatives, the user compounds icons to create cascading hierarchical structure. Screenshots are here.
P MediaTagger (Peter Wittenburg)
"MediaTagger is a tool for transcription and analysis of digitized video "Movies" on Apple Macintosh. The basic idea of MediaTagger is to select a time slice of a video (+audio) movie and tag it with a transcription text or code. Tags can contain free text or a fixed code from a user designed classification system. This labeling can be done on any number of levels, and levels can be totally independent."
VTP SignStream (Carol Neidle)
SignStream allows users to annotate video and audio language data in multiple parallel fields that display the temporal alignment and relations among events. It has been used most extensively for analysis of signed languages, including manual and non-manual (head, face, body) information; type of message (e.g. Wh-question); parts of speech; and spoken-language translations of sentences. Here is a screenshot.

SignStream v.2, which allows display of up to 4 synchronized video files and a visual representation of the wave form of a synchronized sound file, runs on MacOS computers running System 8.0 or later and requires QuickTime version 3.0.2 or later. (It has not yet been tested with OS-X.) SignStream is being designed specifically for the study of signed languages and the gestural component of spoken languages; however, the tool may be applied more generally to video and/or audio language data. SignStream signed-language data sets are distributed on CD-ROM and through their WWW data repository. Although the binary database files can only be read with SignStream, data may be exported to a text file. For some data sets, these text files are being made available via the Web.

A pure Java port of SignStream is currently under development at Boston University. The project has benefited from collaboration with several universities, including Dartmouth College and Gallaudet University. [NSF award]

T syncWRITER (med-i-bit)
This is commercial software for the Mac which allows you to synchronize different "tracks" -- e.g., multiple transcription tracks, a video track and a comment track. The authors call a collection of these tracks a "score" and the format resembles a musical score. Video is stored in a track as a series of time-stamped frames. A user interested in a specific part of the video can put the desired frames on the score and use the other tracks to describe them. Although playing a video is allowed, it is not synchronized with the other tracks while playing. A demo version is available. A single license for an educational institution is DM 850.00.
T Transana (Chris Fassnacht)
This software facilitates the transcription and analysis of video data. It allows users to view video, create a transcript, and link places in the transcript to the corresponding frames in the video. Analylically interesting videos and portions of video can be identified, organized, and easily accessed through the tools provided. The organizational structure is central to Transana. In this system, video files are referred to as episodes. Related episodes can be grouped into a series. Each episode can also be broken down into segments called clips. Related clips can be grouped together into what is called a collection. Keywords can be assigned to both clips and episodes to describe the content. Related Keywords can then be grouped together and also searched. A database window, arranged in a tree structure, is featured to help manipulate and organize these groupings. This organizational system allows for easy storage of large collections of video files.
VTP VISLab's Cross-Modal Analysis of Signal and Sense (Francis Quek)
This project is based at the Vision Interfaces and Systems Laboratory (VISLab) at Wright State University. The intent is to create a large database of video annotated with information about gesture, speech and gaze. This, then, will be used to empirically test theories of multimodal communication. Process descriptions and downloadable tools and data are available. It is a collaborative project with partcipants from Wright State, The University of Chicago, Purdue, the University of Illinois, Chicago, the University of Wisconsin, Milwaukee, Reed College, and the National Yang-Ming University, Taiwan [NSF award].
 
Last update: 26 March 2001
Craig Martell


To be added:
http://hulk.bu.edu/pubs/papers/1996/carrer-vane96/TR-08-15-96.html