Tools

LDC develops tools to support evolving annotation tasks. All tools distributed by LDC are available at no cost under an open source license.

  • LDC Broad Phonetic Class Speech Activity Detector: Based on the broad phonetic class recognizer implemented in the HTK Speech Recognition Toolkit, LDC’s speech activity detector model runs the speech signal through a GMM-HMM recognizer to identify five broad phonetic classes: vowel, stops/affricate, fricative, nasal and glide/liquid. The LDC Broad Phonetic Class Speech Activity Detector is available on github under a GPL v3 license.    
  • AGTK, Annotation Graph Toolkit: Annotation Graphs are a formal framework for representing linguistic annotations of time series data. They abstract away from file formats, coding schemes and user interfaces, providing a logical layer for annotation systems. AGTK is a toolkit for using the Annotation Graph model. AGTK is made available under the Common Public License.
  • CTK, Champollion Toolkit: CTK aims to provide ready-to-use parallel text sentence alignment tools for as many language pairs as possible. CTK is made available under the GNU General Public License, version 3.0.
  • LDC Word Aligner: A tool used to build manual word alignments. LDC Word Aligner is made available under the GNU General Public License, version 3.0.
  • SPHERE Conversion Tools: Programs for converting NIST SPHERE speech files to other formats.