CCGbank -- the machine-readable predicate-argument structure files (.parg)
-------------------------------------------------------------------------
Julia Hockenmaier and Mark Steedman
The predicate-argument structure files give for each sentence a list
of the word-word dependencies in the predicate-argument structure,
including locally mediated and long-range dependencies, which are
indicated as such. For each file in the original Treebank, there is
one corresponding predicate-argument structure file.
Each sentence is enclosed by an ... tag. The opening tag
contains an attribute that indicates the sentence ID and is followed
by the index of the last token in the sentence.
Each dependency appears on one line.
12
1 0 N/N 1 Vinken Mr.
1 2 (S[dcl]\NP)/NP 1 Vinken is
3 2 (S[dcl]\NP)/NP 2 chairman is
3 4 (NP\NP)/NP 1 chairman of
6 4 (NP\NP)/NP 2 N.V. of
6 5 N/N 1 N.V. Elsevier
11 4 (NP\NP)/NP 2 group of
11 8 NP[nb]/N 1 group the
11 9 N/N 1 group Dutch
11 10 N/N 1 group publishing
<\s>
A dependency between the i-th and j-th word (word_i and word_j) where
the j-th word has the lexical (functor) category cat_j and the i-th
word is head of the constituent which fills the k-th argument slot of
cat_j is described as follows:
i j cat_j arg_k word_i word_j
Words in each sentence are numbered from 0 to n.
In the sentence ``Mr. Vinken is chairman of Elsevier'', 'Vinken' is
the second word in the sentence and head of the constituent which
fills the first (and only) argument slot of the N/N 'Mr.'. At the same
time, the N 'Vinken' is head of the constituent which fills the first
argument slot of the (S[dcl]NP)/NP 'is', which is the third word in
the sentence. Therefore:
1 0 N/N 1 Vinken Mr.
1 2 (S[dcl]\NP)/NP 1 Vinken is
In version 1.1 of CCGbank, Penn Treebank sentences that do not receive
a CCG derivation do not appear in the PARG files.