Chinese Proposition Bank 2.0 is a continuation of the Chinese
Propostion Bank project, which aims to create a corpus of Chinese text annotated
with information about basic semantic propositions. Chinese
Propostion Bank 1.0 consists of predicate-argument annotation on 250,000
words from Chinese
Treebank 5.0. Chinese Proposition Bank 2.0 adds predicate-argument annotation
on 500,000 words from Chinese
Treebank 6.0. The data sources include newswire from Xinhua News Agency,
articles from Sinorama Magazine, news from the website of the Hong Kong Special
Administrative Region and transcripts from various Chinese broadcast news programs.
Data
This release contains the predicate-argument annotation of 81,009 verb instances
(11,171 unique verbs) and 14,525 noun instances (1,421 unique nouns). The annotation
of nouns is limited to nominalizations that have a corresponding verb. The general
annotation guidelines and the lexical guidelines (called frame files) for each
verbal and nominal predicate are included in this release.
| Total propositions for verbs: | 81,009 |
| Total propositions for nouns: | 14,525 |
| Total verbs framed: |
11,171 |
| Total framesets: |
11,776 |
| Verbs with multiple framesets: |
474 |
| Average framesets per verb: |
1.05 |
|
| Total nouns framed: | 1,421 |
| Total noun framesets: |
1,528 |
| Nouns with multiple framesets: |
48 |
| Average framesets per noun: |
1.08 |
Samples
For an example of the data in this corpus, please examine this sample image(jpeg) of a parse tree.
Content Copyright
Portions © 2000-2001 China Broadcasting System, © 2000-2001 China
Central TV, © 2000-2001 China National Radio, © 2000-2001 China Television
System, © 1997 The Government of the Hong Kong Special Administrative Region,
© 1996-2001 Sinorama Magazine, © 1994-1998 Xinhua News Agency, ©
2001, 2004, 2005, 2007, 2008 Trustees of the University of Pennsylvania |