Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



Korean Telephone Conversations Transcripts

Item Name: Korean Telephone Conversations Transcripts
Authors: Eon-Suk Ko, Na-Rae Han, Stephanie Strassel, and Nii Martey
LDC Catalog No.: LDC2003T08
ISBN: 1-58563-264-3
Release Date: May 16, 2003
Data Type: text
Data Source(s): telephone conversations
Application(s): speech recognition
Language(s): Korean
Language ID(s): kor
Distribution: Web Download
Member fee: $0 for 2003 members
Non-member Fee: US $1500.00
Reduced-License Fee: US $750.00
Extra-Copy Fee: N/A
Non-member License: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: Eon-Suk Ko, et al.
2003
Korean Telephone Conversations Transcripts
Linguistic Data Consortium, Philadelphia

Introduction

Korean Telephone Conversations Transcripts was produced by Linguistic Data Consortium (LDC) catalog number LDC2003T08 and ISBN 1-58563-264-3.

The telephone conversations on which these transcripts are based were originally recorded as part of the CALLFRIEND project. The CALLFRIEND Korean telephone speech was collected by Linguistic Data Consortium primarily in support of the Language Identification (LID) project, sponsored by the U.S. Department of Defense. The calls were later transcribed for use in other projects.

This publication consists of 100 transcribed telephone conversations in Korean. The corresponding speech is published as Korean Telephone Conversations Speech. The Korean orthographic forms from the 100 trascription files serve as the head-words in the associated Korean Telephone Conversations Lexicon.

The recorded conversations are between native speakers of Korean and last up to 30 minutes, of which the transcribed speech covers between 15 to 18 minutes. All speakers were aware that they were being recorded. They were given no guidelines concerning what they should talk about. Once a caller was recruited to participate, he/she was given a free choice of whom to call. Most participants called family members or close friends. All calls originated in either the United States or Canada.

Data

There are 100 text files, totalling approximately 190K words and 25K unique words.

All files are in Korean orthography: orthographic Korean characters are in Hangul, encoded in KSC5601 (Wansung) system.

Please follow this link for a sample transcript: txt | gif.

Updates

There are no updates available at this time.

Content Copyright

Portions © 2003 Trustees of the University of Pennsylvania.


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact: ldc@ldc.upenn.edu

(c) 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.