CCGbank and TGrep2 ------------------ CCGbank is searchable with TGrep2, an expression matcher for trees developed by Douglas Rohde. Obtaining TGrep2 ---------------- TGrep2 is available free of charge from http://tedlab.mit.edu/~dr/TGrep2/ It is governed by the GNU General Public License, version 2. Searching CCGbank with TGrep2 ----------------------------- The file ccgbank00-24.t2c contains all sections of CCGbank in TGrep2 format. Assuming this file is in directory $CCG_TGREP2 on your system, set the environment variable TGREP2_CORPUS to $CCG_TGREP2/ccgbank00-24.t2c or invoke TGrep2 with the option "-c $CCG_TGREP2/ccgbank00-24.t2c", If TGrep2 version 1.15 or higher is used on ccgbank00-24.t2, it will run in CCG mode, which differs from its standard mode in the following ways: - In TGrep2's CCG mode, brackets ("[" and "]"), parentheses ("(" and ")") and slashes ("\" and "/") can be part of a node label, but have to be preceded by a backslash in regular expression searches - In TGrep2's CCG mode, curly brackets ("{" and "}") are used instead of parentheses ("(" and ")") to specify dominance relations and to bracket the output trees. - In TGrep2's CCG mode, the plus and minus signs ("+" and "-") are used instead of brackets ("[" and "]") to group disjunctive terms (eg. "NP [ > PP | > S ]" in standard TGrep2 becomes "NP + > PP | > S-" in CCG mode. A few examples: ------------------ % tgrep2 "/.*/ /^S\[.*\]/" finds any NP that is a subject. % tgrep2 "NP>/^S\[.*\]\\NP/" finds any NP that is a direct object: % tgrep2 "NP + > PP | > S[dcl]-" finds any NP that is dominated either by a PP or a tensed sentence. (In CCG mode, + and - are used instead of [ and ] for node disjunction) Acknowledgements ---------------- We are very grateful to Doug Rohde for adapting TGrep2 to CCGbank.