Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



DSO Corpus of Sense-Tagged English

Item Name: DSO Corpus of Sense-Tagged English
Authors: Hwee Tou Ng and Hian Beng Lee
LDC Catalog No.: LDC97T12
ISBN: 1-58563-119-1
Data Type: text
Data Source(s): newswire, varied
Application(s): natural language processing
Language(s): English
Language ID(s): ENG
Distribution: Web Download
Member fee: $0 for 1997 members
Non-member Fee: US $500.00
Reduced-License Fee: US $250.00
Extra-Copy Fee: N/A
Non-member License: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: Hwee Tou Ng and Hian Beng Lee
1997
DSO Corpus of Sense-Tagged English
Linguistic Data Consortium, Philadelphia

Introduction

This corpus contains sense-tagged word occurrences for 121 nouns and 70 verbs which are among the most frequently occurring and ambiguous words in English. These occurrences are provided in about 192,800 sentences taken from the Brown corpus and the Wall Street Journal and have been hand tagged by students at the Linguistics Program of the National University of Singapore. WordNet 1.5 sense definitions of these nouns and verbs were used to identify a word sense for each occurrence of each word.

Data

In addition to providing the word occurrences in their full sentential context, the corpus includes complete listings of the WordNet 1.5 sense definitions used in the tagging.

The following example illustrates the format of a sentence with a sense tag for the word "action," followed by the corresponding WordNet1.5 sense definition:


    ca01.db #020 `` These >> actions 8 << should serve to protect in
        fact and in effect the court 's wards from undue costs and its
        appointed and elected servants from unmeritorious criticisms, '' the jury said .

    Sense 8
        legal action, action, case, lawsuit, suit -- (a judicial proceeding
	brought by one party against another; "no criminal cases were heard
	while the judge was ill")
	 => proceeding, legal proceeding, judicial proceeding,
         proceedings -- (the institution of a legal action)
          => due process, due process of law -- (the administration
             of justice according to established rules and principles)
              => group action -- (action taken by a group of people)
                  => act, human action, human activity -- (something
                     that people do or cause to happen)

    

(In the actual corpus, all tagged occurrences of a given noun or verb are stored together in one file, with each full sentence on one line; all noun and verb word sense definitions are stored together in two separate files.)

This sense tagged corpus was provided by Hwee Tou Ng of the Defence Science Organisation (DSO) of Singapore. It was first reported in the following paper at ACL-96:


    "Integrating Multiple Knowledge Sources to Disambiguate Word Sense: 
    An Exemplar-Based Approach," by Hwee Tou Ng and Hian Beng Lee, in
    Proceedings of the 34th Annual Meeting of the Association for
    Computational Linguistics, pages 40-47, Santa Cruz, California, USA,
    June 1996.  ( http://xxx.lanl.gov/abs/cmp-lg/9606032 )

    

Updates

There are no updates at this time.

Content Copyright


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact: ldc@ldc.upenn.edu

(c) 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.