| Linguistic Data Consortium | |
| Main Sample Transcript Keyboard Shortcuts Tools Help Emacs Help LDC Questions? |
The goal of rapid transcription is straightforward - transcribing an assigned
audio file as quickly and accurately as possible. In order to facilitate
this effort, a stripped-down transcription specification is utilized.
Process
1) Run tel-trans fsh on command line
2) Select available files from available categories.
The files available have been segmented by an automatic process - the available
working categories are as follows:
empty - no text in file in-seg
- in process of being time segmented marked -
time-segmentation completed typing - in process of
transcription full - transcription completed
Did you - a) remember to
spell check the transcript?
under drop down menus
("Edit"-> "spell" -> "check buffer" ) (more details, see below)
b)
remember to run the syntax checker? (Ctrl-c, k) (more details, see below)
checking - final process qc checking done -
all stages completed problem - issue with file/audio
that needs to be addressed released - completed files
delivered to participants
TIMESTAMPS - Insertion of "breakpoints"
(timestamps) has the same appearance as a new speaker turn. Breakpoints can be
inserted wherever they seem convenient to the transcriber. They should occur at
the natural boundaries of speech, such as pauses, breaths, etc. Breakpoints
should not be inserted in the middle of words. The time stamp has both a start
and end point, and neither point can overlap a previous timestamp of the same
speaker. Use of the syntax checker will validate overlapping regions.
ORTHOGRAPHY
PUNCTUATION - No punctuation necessary.
CAPITALIZATION - No capitalization necessary.
TRUNCATION (-) - as suffix and prefix for truncations
NO HYPHENS - Seventy two
SYMBOLS - No special symbols for mispronunciations, etc.
ACRONYMS - ~cnn - For acronyms pronounced as
series of letters, use tilde
- if the acronym is pronounced as a word it should be spelled accordingly (not
distinguished) - for example "aids"
INTERJECTIONS - Please use only the following English interjections.
NON-LEXEMES - (Not marked) In addition to the interjections (which are considered to be words), we also have a set of standardized spellings for hesitation sounds that speakers make while talking. English non-lexemes (to give you an idea of the criterion for lexemes and non-lexemes .)
NOISES - Sound phenomena such as distortion, coughs, breaths, unintelligible speech, foreign words and phrases will not be transcribed. Sounds that are not made by the talker (usually background or channel) will do not need to be transcribed.
UNCLEAR SPEECH - (( ))
UNCLEAR SPEECH (guess) - ((alternated))
SPELLING
Given the nature of the task, there are going to be a number of spelling errors
in ever file. Please make sure that you run the spell checker over each
completed file. A number of errors will be related to spacing (or rather, the
lack thereof.)
Common errors
| INCORRECT | CORRECT |
| ~ok | okay |
| ~c~n~n | ~cnn |
| ~upenn | ~u penn |
| usda | ~usada |
| gonna | going to |
| aka | ~aka |
| [[skip]] | (( )) |
Keyboard Shortcuts
| Play window | C-c w |
| Play line cursor is on. | < TAB > |
| Play line cursor is on and move to next line | <Alt> + <TAB> |
| Play a between marks | C-c a |
| Play b between marks | C-c b |
| Send timestamps to waveform | C-c s |
| Play line with cursor | C-c c |
| Toggle play speed | C-c t |
| Set fast playback | C-c z |
| Set slow playback | C-c x |
| Stop playback | C-c q |
| Get timestamps from waveform b | C-c f |
| Get timestamps from waveform a | C-c g |
| Syntax Checker | C-c k |
Syntax Checking
When the syntax checker is run, it validates the structure of the document - a number of error messages may appear. In order to leap to the line that the error is on, put the cursor in the new window on the error, and press the mouse middle button. You will be taken to the error in the document. (Note) if you remove/add a line, the lines that you will be taken to will be off by one until the program is re-run.
Common Messages include:
time-stamp without text data?
The timestamp does not contain corresponding
transcript data (this can be ignored in most files for rapid trans)
time-stamp follows non-empty line
an empty line should follow each transcribed
timestamp. (double spaced document)
turn should be on single line
only one turn permitted for each line.
closing angle (`>') should be followed by space
self evident
bracket error with '[]'
may be a number of possibilities
bracket error with '()'
may be a number of possibilities - space needed
between the brackets (()) is (( ))
bad spacing around punctuation `.'
There should exist a space after punctuation
bad spacing around punctuation `?'
There should exist a space after punctuation
closing paren (`))') should be followed by space
self explanatory
turn contains ILLEGAL CHARACTER `!'
Some characters are not allowed within the text - for
instance, exclamation points -
digits found in text
There should not be any numerals in the text outside of
the timestamps