|

|
|
Korean Propbank
| |
| Item Name: | Korean Propbank |
| Authors: | Martha Palmer, Shijong Ryu, Jinyoung Choi, Sinwon Yoon, and Yeongmi Jeon |
| LDC Catalog No.: | LDC2006T03 |
| ISBN: | 1-58563-374-7 |
| Release Date: | Mar 24, 2006 |
| Data Type: | text |
| Data Source(s): | newswire |
| Application(s): | discourse analysis, information extraction, language identification, language modeling, language teaching, natural language processing, parsing |
| Language(s): | Korean |
| Language ID(s): | kor |
| Distribution: | Web Download |
| Member fee: | $0 for 2006 members |
| Non-member Fee: | US$500.00 |
| Reduced-License Fee: | US$250.00 |
| Extra-Copy Fee: | N/A |
| Non-member License: | yes |
| Member License: | yes |
| Online documentation: | yes |
| Licensing Instructions: | Subscription Members, Standard Members, Non-Members |
| Citation: | Martha Palmer, et al. 2006 Korean Propbank Linguistic Data Consortium, Philadelphia |
|
Introduction
Korean Propbank Annotations is a semantic annotation of the Korean English
Treebank Annotations and Korean Treebank Version 2.0. Each verb and adjective
occurring in the Treebank has been treated as a semantic predicate and the
surrounding text has been annotated for arguments and adjuncts of the
predicate. The verbs and adjectives have also been tagged with coarse grained
senses. This work was done in the Computer and Information Sciences Department
at the University of Pennsylvania. The XML format and KSC 5,601 character set
encoding are used in the frames file.
Data
There are two basic components to Korean Propbank:
- The Verb Lexicon. A frames file, consisting of one or more frame sets, has
been created for each predicate occurring in the Treebank. These files
serve as a reference for the annotators and for users of the data. 2,749
such files have been created, totaling about ~10 MB of uncompressed data.
- The Annotation. There are two annotation files. The virginia-verbs.pb file
has 9,588 annotated predicate tokens. These predicate tokens include all
those occurring in over 54,000 words of the Korean English Treebank
Annotations, totaling ~791 KB of uncompressed data. The newswire-verbs.pb
file has 23,707 annotated predicate tokens. These predicate tokens include
all those occurring in over 131,000 words of the Korean Treebank
Version 2.0, totaling ~2,054 KB of uncompressed data.
Samples
This image displays an example of the data in this corpus.
Content Copyright
Portions © 2001-2002 CoGenTex, Inc., © 1994-2000 Korean Press Agency, © 1998-2006 Trustees of the University of Pennsylvania |
|
|