Introduction
Hindi WordNet, Linguistic Data Consortium
(LDC) catalog number LDC2008L02 and isbn 1-58563-470-0, was developed by researchers
at the Center for Indian Language Technology, Computer Science and Engineering
Department, IIT Bombay.
Hindi, a member of the Indo-Iranian language family, is the primary national language of India and is spoken by approximately
500 million people making it the fifth largest language in the world. Inspired
by the well-known English language Wordnet,
Hindi Wordnet is the first wordnet for an Indian language. Wordnets are systems
for analyzing the different lexical and semantic relations between words. Specifically,
a wordnet is a word sense network in which words are grouped into sematically
equivalent units called synsets. Each synset represents a lexical concept, and
synsets are linked to each other by semantic relations (between synsets) and
lexical relations (between words). Similar in design to the Princeton Wordnet
for English, Hindi Wordnet incorporates additional features to capture the complexities
of Hindi. This release
of Hindi Wordnet consists of 56,928 unique words and 26,208 synsets.
Additional information about the development of Hindi Wordnet is available
at the Hindi
WordNet web site.
Data
Hindi WordNet contains nouns, verbs, adjectives and adverbs. Each entry consists
of the following elements:
-
Synset:
a set of synonymous words. For example, ?विद्यालय,
पाठशाला,
स्कूल?
(vidyaalay,
paaThshaalaa, skuul) represents the concept of school as an educational
institution. The words in the synset are arranged according to the frequency
of usage.
-
Gloss:
the concept. It consists of two parts:
Text definition: It explains the concept denoted by the synset.
For example, ?वह
स्थान जहाँ प्राथमिक
या माध्यमिक स्तर
की औपचारिक शिक्षा
दी जाती है? (vah
sthaan jahaaM praathamik yaa maadhyamik star kii aupacaarik sikshaa dii jaatii
hai) explains the concept of school as an educational institution.
Example sentence: It gives the usage of the words in the sentence.
Generally, the words in a synset are replaceable in the sentence. For example,"इस
विद्यालय में
पहली से पाँचवीं
तक की शिक्षा दी
जाती है?
(is vidyaalay me pahalii se pancvii tak kii shikshaa
dii jaatii hai) gives the usage for the words in the synset representing school
as an educational institution.
-
Position in
Ontology: An ontology is a hierarchical organization of concepts, or
more specifically, a categorization of entities and actions. A separate
ontological hierarchy exists for each syntactic category (noun, verb, adjective
adverb). Each synset is mapped into some place in the ontology..
This release of Hindi WorkNet is made available as a complete Java application along
with an API to facilitate further development.
Content Copyright
Portions © 2007 IIT Bombay, © 2008 Trustees of the University of
Pennsylvania |