Please send updates to Craig Martell.
Anvil: Annotation of Video and Language Data
Anvil is a Java-based tool which permits multilayered annotation of video with gesture, posture, and discourse information. The tags used can be freely specified, and can easily be hierarchically arranged. Anvil requires the Java Media Framework, and should run on Solaris, Windows and (possibly) Linux. Video format is quicktime and AVI. Data storage and exchange is XML. It is freely available to the research community if you email the author.
FORM: A Kinematic Gesture-Annotation Scheme
FORM is a gesture annotation scheme designed to capture the kinematic information in gesture from videos of speakers. The FORM project is currently building a detailed database of gesture-annotated videos stored in Annotation Graph format. This will allow the gestural information to be augmented with other linguistic information, like parse-tress of the sentences accompanying the gestures, discourse structure, intonation information, etc.
FORM encodes the "phonetics" of gesture by giving geometric descriptions of location and movement of the right and left arms and hands, the torso and the head. Other kinematic information like effort and shape are also recorded.
Currently, FORM uses Anvil, but they are developing FORMTool, their own open-sourced gesture annotation tool. This tool will be of use to anyone wanting to encode linguistic information from videos in Annotaiton Graph format.
CHILDES/CLAN: Child Language Data Exchange System/Child Language Analysis
The CHILDES/CLAN system is a suite of tools for studying conversational interactions--including those for coding and analyzing transcripts and a system for linking these transcripts to digitized audio and video. CLAN supports both CHAT and CA (Converstational Analysis) notation, with alignment of text with video at the phrase level. There is an extensive online tutorial, as well as downloadable documentation. CLAN is available for both Windows and Mac. [NSF Award, MacArthur Foundation Award, National Institutes of Health Award]
[This project does not seem to be active, however Dr. Sanderson is happy to help users or inquirers "as resources allow."] According to Justine Cassell, MIT Media Labs, MacSHAPA is a flexible video-annotation tool which supports user-defined coding schemes, has a shared-time line for all events, and comes with good search and statistical routines. And, it allows the user to output the data directly into a spreadsheet or SPSS, so that quantatative analyses can be done immediately.
MacSHAPA controls either high-end Panasonic or JVC VCRs, or it (v1.1) can work with Quicktime files.
[This project does not seem to be active. The following is derived from the projects web pages.] Media Streams allows users to create multi-layered, iconic annotations of video content. Using over 2500 iconic primatives, the user compounds icons to create cascading hierarchical structure. Screenshots are here.
"MediaTagger is a tool for transcription and analysis of digitized video "Movies" on Apple Macintosh. The basic idea of MediaTagger is to select a time slice of a video (+audio) movie and tag it with a transcription text or code. Tags can contain free text or a fixed code from a user designed classification system. This labeling can be done on any number of levels, and levels can be totally independent."
SignStream allows users to annotate video and audio language data in multiple parallel fields that display the temporal alignment and relations among events. It has been used most extensively for analysis of signed languages, including manual and non-manual (head, face, body) information; type of message (e.g. Wh-question); parts of speech; and spoken-language translations of sentences. Here is a screenshot.
SignStream v.2, which allows display of up to 4 synchronized video files and a visual representation of the wave form of a synchronized sound file, runs on MacOS computers running System 8.0 or later and requires QuickTime version 3.0.2 or later. (It has not yet been tested with OS-X.) SignStream is being designed specifically for the study of signed languages and the gestural component of spoken languages; however, the tool may be applied more generally to video and/or audio language data. SignStream signed-language data sets are distributed on CD-ROM and through their WWW data repository. Although the binary database files can only be read with SignStream, data may be exported to a text file. For some data sets, these text files are being made available via the Web.
A pure Java port of SignStream is currently under development at Boston University. The project has benefited from collaboration with several universities, including Dartmouth College and Gallaudet University. [NSF award]
This is commercial software for the Mac which allows you to synchronize different "tracks" -- e.g., multiple transcription tracks, a video track and a comment track. The authors call a collection of these tracks a "score" and the format resembles a musical score. Video is stored in a track as a series of time-stamped frames. A user interested in a specific part of the video can put the desired frames on the score and use the other tracks to describe them. Although playing a video is allowed, it is not synchronized with the other tracks while playing. A demo version is available. A single license for an educational institution is DM 850.00.
This software facilitates the transcription and analysis of video data. It allows users to view video, create a transcript, and link places in the transcript to the corresponding frames in the video. Analylically interesting videos and portions of video can be identified, organized, and easily accessed through the tools provided. The organizational structure is central to Transana. In this system, video files are referred to as episodes. Related episodes can be grouped into a series. Each episode can also be broken down into segments called clips. Related clips can be grouped together into what is called a collection. Keywords can be assigned to both clips and episodes to describe the content. Related Keywords can then be grouped together and also searched. A database window, arranged in a tree structure, is featured to help manipulate and organize these groupings. This organizational system allows for easy storage of large collections of video files.
VISLab's Cross-Modal Analysis of Signal and Sense
This project is based at the Vision Interfaces and Systems Laboratory (VISLab) at Wright State University. The intent is to create a large database of video annotated with information about gesture, speech and gaze. This, then, will be used to empirically test theories of multimodal communication. Process descriptions and downloadable tools and data are available. It is a collaborative project with partcipants from Wright State, The University of Chicago, Purdue, the University of Illinois, Chicago, the University of Wisconsin, Milwaukee, Reed College, and the National Yang-Ming University, Taiwan [NSF award].