Fisher English Training Speech Part 1 Transcripts represents the first half of a collection ofconversational telephone speech (CTS) that was created at the LDCduring 2003. It contains transcript data for 5,850 completeconversations, each lasting up to 10 minutes. In addition to thetranscriptions, which are found under the trans directory, there isa complete set of tables describing the speakers, the properties ofthe telephone calls, and the set of topics that were used to initiatethe conversations. The corresponding speech files are contained in Fisher English Training Speech Part 1 Speech (LDC2004S13).
The Fisher telephone conversation collection protocol was createdat LDC to address a critical need of developers trying to build robustautomatic speech recognition (ASR) systems. Previous collectionprotocols, such as CALLFRIEND and Switchboard-II and the resultingcorpora, have been adapted for ASR research but were in fact developedfor language and speaker identification respectively. Although theCALLHOME protocol and corpora were developed to support ASRtechnology, they feature small numbers of speakers making telephonecalls of relatively long duration with narrow vocabulary across thecollection. CALLHOME conversations are challengingly natural andintimate. Under the Fisher protocol, a very large number ofparticipants each make a few calls of short duration speaking to otherparticipants, whom they typically do not know, about assignedtopics. This maximizes inter-speaker variation and vocabulary breadthalthough it also increases formality.
Previous protocols such as CALLHOME, CALLFRIEND and Switchboardrelied upon participant activity to drive the collection. Fisher isunique in being platform driven rather than participantdriven. Participants who wish to initiate a call may do so howeverthe collection platform initiates the majority of calls. Participantsneed only answer their phones at the times they specified whenregistering for the study.
To encourage a broad range of vocabulary, Fisher participants areasked to speak on an assigned topic which is selected at random from alist, which changes every 24 hours and which is assigned to allsubjects paired on that day. Some topics are inherited or refined fromprevious Switchboard studies while others were developed specificallyfor the Fisher protocol.
Overall, about 12% of the conversations were transcribed at the LDC,and the rest were done by BBN and WordWave using a significantlydifferent approach to the task. A central goal in both sets was tomaximize the speed and economy of the transcription process. Thisin turn involved certain aspects of mark-up detail and quality controlthat may have been common in previous, smaller corpora.
The LDC transcripts were based on automatic segmentation of the audiodata, to identify the utterance end-points on both channels of eachconversation. Given these time stamps, manual transcription wassimply a matter of typing in the words for each segment and doing arudimentary spell-check. No attempt was made to modify thesegmentation boundaries manually, or to locate utterances that thesegmenter might have missed. Portions of speech where the transcribercould not be sure exactly what was said were marked with doubleparentheses -- (( ... )) -- and the transcriber could hazard aguess as to what was said, or leave the region between parenthesesblank. The LDC transcription process yields one plain-text transcriptfile per conversation, in which the first two lines show the call-IDand the fact that the transcript was done at the LDC the remainder ofthe file contains one utterance per line (with blank lines separatingthe utterances), with the start-time, end-time, speaker/channel-ID andutterance text.
Data collection and transcription were sponsored by DARPA and theU.S. Department of Defense, as part of the EARS project for researchand development in automatic speech recognition.
Please examine this sample to see an example of the data in this corpus.
Copyright© 2003-2004 Trustees of the University of Pennsylvania