COMNOM is an automatically enriched version of COMLEX Syntax that was created at New York University as part of the NomBank annotation project. COMLEX resources are distributed by the Linguistic Data Consortium (LDC) and consist of the following: COMLEX English Syntax Lexicon (LDC98L21), an English dictionary consisting of approximately 38,000 lemmas with detailed information about the syntactic characteristics of each lexical item and subcategorization (complement structures); and COMLEX Syntax Text Corpus Version 2.0 (LDC96T11).
COMNOM adds classes to COMLEX Syntax lexical entries using NOMLEX-PLUS, a dictionary with approximately 8,000 entries. COMNOM collected prepositions from NOMLEX-PLUS sub-categorizations (:VERB-SUBC, :OBJECT, :SUBJECT, etc.), deduced essential complements from them and added them to the existing COMLEX entry.
Further information about the methodology used in COMNOM can be found in Meyers, "Those Other NomBank Dictionaries -- Manual for Dictionaries that Come with NomBank". Related resources and further information about COMNOM and NomBank are available from the Nom Bank project website.
A license to COMLEX English Syntax Lexicon (LDC98L21) or COMLEX Syntax Text Corpus Version 2.0 (LDC96T11) is required in order to obtain COMNOM v. 1.0.
This release includes three versions of COMNOM which correspond to the three versions of NOMLEX-PLUS and are characterized by the amount of corpus training that influenced their creation. The data used for training are the Wall Street Journal materials in the Penn Treebanks (Treebank-2 and Treebank-3),
with annotations from Proposition Bank I and NomBank 1.0.
The three versions are:
COMNOM-clean.1.0 -- contains no information derived from annotated data
COMNOM.1.0 -- contains information from the entire annotated corpus
COMNOM-training.1.0 -- contains information from annotated data in sections 02-21 of the corpus only.
Portions © 1987-1989 Dow Jones & Company, Inc., © 1996, 1998, 2008 Trustees of the University of Pennsylvania