The development of software tools for linguistic analysis, like any software application, profits from first-hand experience using technology incrementally to solve specific problems. The tools described herein have been developed, revised and used over the past two years in the analysis of vowel systems represented in field data. Where development often begins with the goals of creating portable, standard tools for general use, the present effort focussed initially on developing tools that were optimized for the analysis of vowels for a specific study and on a specific platform. It can be argued that such an approach is a necessary complement to more general discussions of needs, standards, formats, tools and best practices. Vowel analysis was conducted within the analytical model of contemporary variationist sociolinguistics. Although linguists typically think of sociolinguistics and field linguistics as separate rubrics, the two overlap in many areas relevant to tool development, specifically the use of empirical data collected from native speakers in an interview situation to be analyzed bottom-up for its impact on one or another theory. My goal over the past two years has been to take a hard look at how I do sociolinguistic fieldwork and analysis in comparison with how language data are currently used in corpus linguistics and language engineering. The approach, data structures and tools that result draw from both traditions.
The interviews conducted resemble the classic Labovian sociolinguistic interviews in which informants are asked to discuss their lives and interests and are encouraged to tell stories and become engaged in the subject matter of the interview. Toward the end of the conversation, more formal elicitation devices such as word lists, minimal pair lists and reading selections were added as appropriate. To alleviate the foreigner effect, a native speaker was engaged to conduct some interviews while others combined a non-native interviewer with two or more native subjects. On the tapes and in the audio files, subjects occupy their own audio channel.
Audio files were created by digitizing the original analog tape using a Sony DAT recorder connected to a Sun Sparcstation via a Townshend DAT link. The interviews were stored on digital audio tape in 16 bit, 44KHz files for archival purposes but were down-sampled to 16 bit, 16Khz files for storage on computer disk and for analysis.
To facilitate the location and analysis of lexical items. Each interview was transcribed and the transcripts were time-aligned to the audio signal so that speaker-turns can be isolated and overlapped speech identified.
Initially, files were transcribed directly from audio tape into computer files that were later time aligned to the digital audio files. Later conversations were digitized and segmented before transcription. As an experiment, a small number of interviews were transcribed by using a native speaker as an intermediary between the original conversation and a speech recognition package.
Not all of the utterances in the interview are equally interesting. Excluded are the non-native utterances of the interviewer and utterances from unknown third parties who may have briefly interrupted the interview. To control for the influence of conversational situation on the vowel system, each utterance is categorized according to whether it contains careful or relaxed speech to the interviewer or to a third party or was collected during a more formal elicitation exercise. Although this categorization was applied at the utterance level for practical reasons, the conversation situation changes infrequently and can be viewed as spanning large stretches of the interview.
Transcription and Segmentation and Utterance Level Annotation employed the STRANS utility written by Zhibiao Wu at LDC plus the SegmentHelper add-in. The Transcriber offers additional features in a platform independent implementation.
Once the interview has been transcribed and the utterances from the speaker under study have been selected, the FindWords utility (shown below) selects specific lexical items within a utterance-length context. (Click on the image for an enlargement.)
Even though Italian orthography is close to phonemic, this study benefits from access to an online dictionary that includes pronunciations. FindWords uses regular expression searching over the orthographic representation with occasional reference to the pronouncing dictionary. Each of these hits is numbered and stored in a database for later use.
Working from the database of selected utterances that contain lexical currently of interest, the next step is to identify the position of the word in the waveform file. This will facilitate acoustic analysis of the word and subsequent display or dissemination. FindWords employs a graphical user interface, to guide the linguist through this process. Once the lexical items have been identified for a given speaker, "Word Spectrograms" runs in batch mode to create individual audio files, wide and narrow band spectrograms and pitch, power and voicing tracks for each word. These are stored in a database keyed by "hit" number.
The goal of this analysis is to characterize the length and formant values of a large corpus of Italian vowels for a stratified sample of speakers. The goals of measuring length and extracting formant vowels require separate consideration. The Segment Vowels tool (shown below) presents each lexical item as a time-aligned assembly of its waveform, wide and narrow band spectrogram and F0 characteristics.
Using this information, the research can set cursors to indicate the beginning and end of the vowel under study. Determining the appropriate place to measure formant values is still subject top argument. The Summarization and Display tool allows the linguist to select from one of several methods for measuring vowels and plots them in vowel space.
The tools combine utilities from Entropics Xwaves and ESPS products with the emacs editor and utilities written by programmers at LDC. The components are accessed via graphical user interfaces written in PerlTK.
The data files created under this approach include the audio file and transcript, an index to all utterances, words and segments selected plus databases of the audio files, spectrograms and F0 files for each word. Demographic information for the subjects forms a separate file related to the database by subject ID.
The annotation proceeds by iteratively adding information about progressively focussed objects: the utterance, the word and the vowel segment. At several points in the process that the time-based annotations of the audio signal must be aligned among themselves and reconciled with paradigmatic information in for example, the dictionary and the speaker demographic file.