The XTrans Tool
XTrans is a next generation multi-platform, multilingual, multi-channel transcription tool developed by Linguistic Data Consortium (LDC) to support manual transcription and annotation of audio recordings. As human language technology expands its horizons, data providers must respond with infrastructure that supports rapid development of high quality, high-volume linguistic resources in a nearly unlimited number of languages and genres. The XTrans toolkit provides new and efficient solutions to common transcription challenges and addresses critical gaps in existing tools.
Designed with input from experienced human transcribers working with real world data, XTrans provides a flexible and intuitive graphical user interface for a multitude of speech annotation tasks including (virtual) segmentation of audio into smaller units like turns and sentences; speaker identification; orthographic transcription in any language; and labeling of structural elements of the transcript like topics.
Since its creation, LDC and its government, academic and industrial partners have used XTrans to generate over 3500 hours of time-aligned verbatim transcripts in a variety of genres and languages including broadcast news, talk shows, sociolinguistic interviews, telephone speech, massively multi-channel meetings and more, in languages including Chinese, Arabic, Spanish, Russian and English.
With an intuitive interface, user configurability and embedded QC functions, XTrans is optimized for high-quality, high-volume transcription tasks involviing real world data.
Real data means messy data. Transcribing multiple overlapping speakers on a single channel can be a challenge for even the most experienced transcribers. XTrans eliminates cumbersome workarounds for this common phenomenon with Virtual Speaker Channels, enabling a virtually unlimited number of distinct speakers of other sound sources to be associated with the same audio channel.
Real data means complex data. XTrans allows transcribers to open an effectively unlimited number of audio files for simultaneous transcription. Transcribers can switch focus between one, two or multiple speakers as needed -- a great feature for transcription of meeting room recordings!
Real data means global data. XTrans provides strong multilingual support, with bidirectional text input for languages like Arabic, Farsi, Hindi, Urdu and Hebrew. XTrans is unicode compliant and provides support for Window and Unix multilingual input methods.
Efficiency and Quality
Realtime transcription rates have improved dramatically in LDC projects using XTrans, with rates for some tasks cut by as much as half. XTrans' combination of intutive tool design, user-configurable conventions for common features, and a novel use of keybindings to support all tasks including audio segmentation, provides the optimal balance of efficiency and quality. Experienced transcribers rarely even have to touch the mouse, which not only speeds up transcription but also reduces hand and wrist fatigue.
XTrans also brings key quality control functions directly into the interface, giving transcribers the power to improve the quality of their own work.
Flexible format, Developer-Friendly
XTrans components are written in Python and C++, utilizing LDC's QWave waveform display module. Even with very large files or multiple recordings, XTrans provides users with fast display and playback capabilities. A range of audio formats is supported, including .sph, .wav, .aiff, .flac, and .ogg. Transcripts are output in a Tab Delimited Format (TDF), which is easily converted to other common formats and is readily usable by downstream manual and automatic annotation tasks. XTrans directly imports files from some XML formats such as Transcriber (TRS) format.
XTrans was designed to be easily extensible to new tasks; LDC recently extended XTrans to build QCTrans, an interface for creating and validating sentence-aligned parallel text starting from transcripts of spoken language.