TGREP(1) USER COMMANDS TGREP(1) NAME tgrep - search and analysis tool for databases of tree structures , i.e. grep for trees SYNOPSIS tgrep [ -v _v_o_c_a_b-_f_i_l_e ] [-wflia$qtTrmxnN] [-p|P] [ -o _o_u_t_p_u_t-_f_i_l_e ] [ -s _s_t_r_i_n_g ] [ -e _s_t_r_i_n_g ] [ -c _X ] _p_a_t_t_e_r_n | -u _T-_c_o_d_e-_f_i_l_e [ _f_i_l_e ... ] tgreo [-ghHO] DESCRIPTION tgrep searches _f_i_l_es of specially compiled tree-structured text in search of _p_a_t_t_e_r_n. Before text can be searched with tgrep it must be preprocessed with tprep(1) which creates a corpus file. If no corpus file(s) are specified on the com- mand line then the environment variable TGREP_CORPUS con- tains a corpus file to use. Usually, several pre-encoded corpora are available with tgrep. Please see the FILES section at the end of this document. For a complete description of how to construct patterns in tgrep refer to the tgrepdoc(1) man page. The second usage form of _t_g_r_e_p specifies help information to print. See the options sections below. _t_g_r_e_p also supports a powerful script language called T which allows you to manipulate the data you search for. Please see the file README.T in the doc subdirectory of the tgrep release. To use a scriptfile, specify the -u _T-_c_o_d_e- _f_i_l_e option on the command line instead of a _p_a_t_t_e_r_n. _t_g_r_e_p can also read databases prepared with the outdated _p_r_e_p_a_r_e__c_o_r_p_u_s(1) in which case the -v _v_o_c_a_b-_f_i_l_e option must be set (or the TGREP_VOCAB environment variable) and the _f_i_l_e must be a dot-n (.n) file created with _p_r_e_p_a_r_e__c_o_r_p_u_s. OPTIONS -v _v_o_c_a_b-_f_i_l_e This option is used when using corpora created with the outdated _p_r_e_p_a_r_e__c_o_r_p_u_s(1). _t_p_r_e_p(1) should be used instead. _t_g_r_e_p can tell the difference between corpora prepared with _p_r_e_p_a_r_e__c_o_r_p_u_s(1) and _t_p_r_e_p(1). It specifies the _v_o_c_a_b-_f_i_l_e as the vocabulary file. This must be the .vocab file produced with _p_r_e_p_a_r_e- _c_o_r_p_u_s(1). If this option is ommited the value of the Sun Release 4.1 Last change: 10 November 1992 1 TGREP(1) USER COMMANDS TGREP(1) environment variable TGREP_VOCAB is used instead. -w Print the Whole sentence which matches _p_a_t_t_e_r_n. -f Print the original File name where pattern matched. This option is needed since a large number of files may be preprocessed into a single corpus file so that the location of the original text would otherwise be lost. The file name is preceeded by a pound sign (#) which makes it a comment for later processing by _t_p_r_e_p or _t_g_r_e_p. -l Print the Line number of the sentence(s) which _p_a_t_t_e_r_n matched. The line number is preceeded by a pound sign (#) which makes it a comment for later processing by _t_p_r_e_p or _t_g_r_e_p. This is the line number of the first word of the sentence in the original file containing the sentence. This option is especially useful with the -f option. -i Print a unique integral identifier for each matched sen- tence. The identifier is only unique per tgrep session and per sentence (not per match). The identifier is preceeded by a pound sign (#) which makes it a comment for later processing by _t_p_r_e_p or _t_g_r_e_p. -a Print All matches. Normally tgrep only prints out the first match it finds on a sentence and then moves onto the next sentence in the corpus. This option causes a pattern to be matched in all possible ways to each sen- tence. -$ Allow a node to be its own sister. That is, A $ A evalu- ates to TRUE. -q Quiet mode. Don't print out "please wait..." messages. -t Print only the terminals of the context sentences on one line. -T Print the context sentences on one line. -r filteR mode. Pass unmatched sentences to output. -m shell Mode. This option is currently not supported, don't use it yet. -x show progress in an X window. On machines equipped with an X display this option pops up a small window which displays the percentage of the corpus already pro- cessed. This option may not work on many different Sun Release 4.1 Last change: 10 November 1992 2 TGREP(1) USER COMMANDS TGREP(1) operating systems. It has only be tested on Sun SPARC machines running SunOS 4.x. -n print on oNe line. Implies pretty printing is OFF. -N print context sentences on oNe line. -p Turn Pretty printing on. Pretty printing is on by default. -P Turn Pretty printing off. -o _o_u_t_p_u_t-_f_i_l_e specify the _o_u_t_p_u_t-_f_i_l_e to write all output to. The default is the standard output. -c _X print _X sentences of context both before and after any sentence which _p_a_t_t_e_r_n matches. This option is not supported when using a T script. -s _s_t_r_i_n_g Specify the Start of output string. Use of this option is discouraged, it will be discontinued. _s_t_r_i_n_g is printed before each matched sentence is printed. _s_t_r_i_n_g defaults to a newline ("\n"). The two special charac- ter sequences "\n" and "\t" may be used to insert new- lines and tabs respectively into _s_t_r_i_n_g. -e _s_t_r_i_n_g Use of this option is discouraged, it will be discon- tinued. Specify the End of output string. _s_t_r_i_n_g is printed after each matched sentence is printed. _s_t_r_i_n_g defaults to nothing, i.e. the empty string (""). The two special character sequences "\n" and "\t" may be used to insert newlines and tabs respectively into _s_t_r_i_n_g. -g Show the Grammar for constructing _p_a_t_t_e_r_n. -h Help. Prints out a summary of usage and command line options. -H Extended Help. Prints out a verbose description of tgrep's options and how to use them. -O list the Options which can be set from within a T scriptfile. FILES _t_g_r_e_p must pre-process the data it is to search. The pre- Sun Release 4.1 Last change: 10 November 1992 3 TGREP(1) USER COMMANDS TGREP(1) processing should be done using _t_p_r_e_p(1). Older formats for processed corpus files exist and _t_g_r_e_p supports both corpus file formats. The old format consisted of two files called XXX.vocab and XXX.n where XXX is a common base name. This format was created with the outdated _p_r_e_p_a_r_e__c_o_r_p_u_s(1) pro- gram. XXX.vocab The vocabulary file for the corpus. This is a text file suitable for visual inspection. XXX.n The encoded corpus file for the corpus. This file is encoded for speed and ease of access and therefore is not accessable by the user. The newer format is created using _t_p_r_e_p(1) which consists of a single file. These files normally have a .c or .corpus suffix. SEE ALSO tprep(1) tgrepdoc(1) DIAGNOSTICS tgrep's error message are ``self explanatory'' and may some- times request that an email message be sent to tgrep- support@linc.cis.upenn.edu. In such a case, please make a detailed report of the error including the command line used to invoke tgrep, a description of the corpus being searched and how it was created, and the value of the environment variables TGREP_CORPUS and TGREP_VOCAB. BUGS If any bugs are encountered please send electronic mail to tgrep-support@linc.cis.upenn.edu with the subject line as one of the following and a concise description as the body of the message: installation for installation problems. bug for reporting a bug in tgrep. feature request for requesting a new feature be added to tgrep. information request for requesting other information about tgrep. Sun Release 4.1 Last change: 10 November 1992 4 TGREP(1) USER COMMANDS TGREP(1) help if you are _r_e_a_l_l_y stuck and need help. other for other communications. COPYRIGHT Copyright 1993, 1994 Richard Pito. AUTHOR Richard Pito (pito@unagi.cis.upenn.edu) under grant from the Benjamin Franklin Institute. Sun Release 4.1 Last change: 10 November 1992 5