Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



COMNOM v 1.0

Item Name: COMNOM v 1.0
Authors: Adam Meyers, Ruth Reeves, Catherine Macleod
LDC Catalog No.: LDC2008T24
ISBN: 1-58563-492-X
Release Date: Sep 15, 2007
Data Type: text
Data Source(s): newswire
Language(s): English
Language ID(s): ENG
Distribution: Web Download
Member fee: $0 for 2008 members
Non-member Fee: US$0.00
Reduced-License Fee: N/A
Extra-Copy Fee: N/A
Non-member License: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: Adam Meyers, Ruth Reeves, Catherine Macleod
2007
COMNOM v 1.0
Linguistic Data Consortium, Philadelphia


Introduction

COMNOM is an automatically enriched version of COMLEX Syntax that was created at New York University as part of the NomBank annotation project. COMLEX resources are distributed by the Linguistic Data Consortium (LDC) and consist of the following: COMLEX English Syntax Lexicon (LDC98L21), an English dictionary consisting of approximately 38,000 lemmas with detailed information about the syntactic characteristics of each lexical item and subcategorization (complement structures); and COMLEX Syntax Text Corpus Version 2.0 (LDC96T11).

COMNOM adds classes to COMLEX Syntax lexical entries using NOMLEX-PLUS, a dictionary with approximately 8,000 entries. COMNOM collected prepositions from NOMLEX-PLUS sub-categorizations (:VERB-SUBC, :OBJECT, :SUBJECT, etc.), deduced essential complements from them and added them to the existing COMLEX entry.

Further information about the methodology used in COMNOM can be found in Meyers, "Those Other NomBank Dictionaries -- Manual for Dictionaries that Come with NomBank". Related resources and further information about COMNOM and NomBank are available from the Nom Bank project website.

A license to COMLEX English Syntax Lexicon (LDC98L21) or COMLEX Syntax Text Corpus Version 2.0 (LDC96T11) is required in order to obtain COMNOM v. 1.0.

Data

This release includes three versions of COMNOM which correspond to the three versions of NOMLEX-PLUS and are characterized by the amount of corpus training that influenced their creation. The data used for training are the Wall Street Journal materials in the Penn Treebanks (Treebank-2 and Treebank-3), with annotations from Proposition Bank I and NomBank 1.0.

The three versions are:

  • COMNOM-clean.1.0 -- contains no information derived from annotated data
  • COMNOM.1.0 -- contains information from the entire annotated corpus
  • COMNOM-training.1.0 -- contains information from annotated data in sections 02-21 of the corpus only.

    Content Copyright

    Portions © 1987-1989 Dow Jones & Company, Inc., © 1996, 1998, 2008 Trustees of the University of Pennsylvania


  • About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Da ta

    Contact: ldc@ldc.upenn.edu

    (c) 1992-2008 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.