Introduction
Korean Telephone Conversations Transcripts was produced by Linguistic Data
Consortium (LDC) catalog number LDC2003T08 and ISBN 1-58563-264-3.
The telephone conversations on which these transcripts are based were originally recorded as
part of the CALLFRIEND project. The CALLFRIEND Korean telephone speech was
collected by Linguistic Data Consortium primarily in support of the
Language Identification (LID) project, sponsored by the U.S. Department of
Defense. The calls were later transcribed for use in other projects.
This publication consists of 100 transcribed telephone conversations
in Korean. The corresponding speech is published as Korean Telephone Conversations Speech.
The Korean orthographic forms from the 100 trascription
files serve as the head-words in the associated Korean Telephone Conversations Lexicon.
The recorded conversations are between native speakers of Korean and
last up to 30 minutes, of which the transcribed speech covers between 15 to
18 minutes. All speakers were aware that they were being recorded. They
were given no guidelines concerning what they should talk about. Once a
caller was recruited to participate, he/she was given a free choice of whom
to call. Most participants called family members or close friends. All
calls originated in either the United States or Canada.
Data
There are 100 text files, totalling approximately 190K words and 25K
unique words.
All files are in Korean orthography: orthographic Korean characters are
in Hangul, encoded in KSC5601 (Wansung) system.
Please follow this link for a sample transcript: txt | gif.
Updates
There are no updates available at this time.
Content Copyright
Portions © 2003 Trustees of the University of Pennsylvania. |