Linguistic Data Consortium  
Main  Sample Transcript    Keyboard Shortcuts   Tools Help     Emacs Help     LDC       Questions?  



Rapid Transcription Guidelines


Objectives

The goal of rapid transcription is straightforward - transcribing an assigned audio file as quickly and accurately as possible.  In order to facilitate this effort, a stripped-down transcription specification is utilized.

Process
1) Run tel-trans fsh on command line
2) Select available files from available categories.
The files available have been segmented by an automatic process - the available working categories are as follows:

empty - no text in file in-seg - in process of being time segmented marked - time-segmentation completed typing - in process of transcription full - transcription completed


Did you - a) remember to spell check the transcript?

under drop down menus

            ("Edit"-> "spell" -> "check buffer" ) (more details, see below)


                 b) remember to run the syntax checker? (Ctrl-c, k) (more details, see below)

checking - final process qc checking done - all stages completed problem - issue with file/audio that needs to be addressed released - completed files delivered to participants



TIMESTAMPS - Insertion of "breakpoints" (timestamps) has the same appearance as a new speaker turn. Breakpoints can be inserted wherever they seem convenient to the transcriber. They should occur at the natural boundaries of speech, such as pauses, breaths, etc. Breakpoints should not be inserted in the middle of words. The time stamp has both a start and end point, and neither point can overlap a previous timestamp of the same speaker. Use of the syntax checker will validate overlapping regions.



ORTHOGRAPHY

PUNCTUATION - No punctuation necessary.

CAPITALIZATION - No capitalization necessary.

TRUNCATION (-) - as suffix and prefix for truncations

NO HYPHENS - Seventy two

SYMBOLS - No special symbols for mispronunciations, etc.

ACRONYMS - ~cnn - For acronyms pronounced as series of letters, use tilde
- if the acronym is pronounced as a word it should be spelled accordingly (not distinguished) - for example "aids"

INTERJECTIONS - Please use only the following English interjections.

  • mhm
  • uh-huh
  • uh-oh
  • whoa
  • whew
  • yeah
  • jeeze
  • wow

    NON-LEXEMES - (Not marked) In addition to the interjections (which are considered to be words), we also have a set of standardized spellings for hesitation sounds that speakers make while talking. English non-lexemes (to give you an idea of the criterion for lexemes and non-lexemes .)

  • ach
  • ah
  • eee
  • eh
  • ew
  • ha
  • hee
  • huh
  • hm
  • huh
  • um
  • uh
  • oh

    NOISES - Sound phenomena such as distortion, coughs, breaths, unintelligible speech, foreign words and phrases will not be transcribed. Sounds that are not made by the talker (usually background or channel) will do not need to be transcribed.

    UNCLEAR SPEECH - (( ))
    UNCLEAR SPEECH (guess) - ((alternated))



    SPELLING Given the nature of the task, there are going to be a number of spelling errors in ever file. Please make sure that you run the spell checker over each completed file. A number of errors will be related to spacing (or rather, the lack thereof.)

    Common errors
    INCORRECT CORRECT
    ~ok okay
    ~c~n~n ~cnn
    ~upenn ~u penn
    usda~usada
    gonnagoing to
    aka~aka
    [[skip]](( ))


    Internet addresses www.ldc.upenn.edu ~www dot ~ldc dot ~u penn dot ~edu

    Keyboard Shortcuts

    Playback
    Play window  C-c w 
    Play line cursor is on. < TAB >
    Play line cursor is on and move to next line  <Alt> + <TAB>
    Play a between marks C-c a
    Play b between marks C-c b 
    Send timestamps to waveform  C-c s 
    Play line with cursor C-c c 
    Toggle play speed C-c t
    Set fast playback C-c z 
    Set slow playback C-c x
    Stop playback C-c q 
     

     

    Inserting timestamps
    Get timestamps from waveform b C-c f
    Get timestamps from waveform a C-c g 
     
    Checker
    Syntax Checker C-c k

    Syntax Checking

       When the syntax checker is run, it validates the structure of the document - a number of error messages may appear. In order to leap to the line that the error is on, put the cursor in the new window on the error, and press the mouse middle button. You will be taken to the error in the document. (Note) if you remove/add a line, the lines that you will be taken to will be off by one until the program is re-run.

    Common Messages include:

    time-stamp without text data?
     The timestamp does not contain corresponding transcript data (this can be ignored in most files for rapid trans)

    time-stamp follows non-empty line
     an empty line should follow each transcribed timestamp. (double spaced document)

    turn should be on single line
     only one turn permitted for each line.

    closing angle (`>') should be followed by space
    self evident

    bracket error with '[]'
    may be a number of possibilities

    bracket error with '()'
    may be a number of possibilities - space needed between the brackets (()) is (( ))

    bad spacing around punctuation `.'
    There should exist a space after punctuation

    bad spacing around punctuation `?'
    There should exist a space after punctuation


    closing paren (`))') should be followed by space
    self explanatory

    turn contains ILLEGAL CHARACTER `!'
    Some characters are not allowed within the text - for instance, exclamation points -

    digits found in text
    There should not be any numerals in the text outside of the timestamps

     

     


    ------------------------
    Frequently Asked Questions
    1. Q) Should we delete or leave blank timestamps for noise that isn't transcribed?
      A) No, you can leave noise timestamps blank
    2. Q) should we transcribe a caller's speaking to someone besides the other caller?
      A) If it is just a few words, yes - if it is an exteded period, use the (( )) convention.
    3. Q) the specs say that there should be no punctuation. does that mean no apostrophe marks for contractions?
      A) No, please include apostrophe marks for contractions.

    -----------------
    Resources
    State Maps MapPoint - U.S. State Maps