Introduction
The Global Yoruba Lexical Database v. 1.0 is a set of related dictionaries providing definitions
and translations for over 450,000 words from the Yoruba language and its variants:
Standard Yoruba (over 368,000 words), Gullah (over 3,600 words), Lucumí
(over 8,000 words) and Trinidadian (over 1,000 words).
Yoruba is a Niger-Congo language (sub classification: Kwa > Yoruboid) spoken
natively by nearly 20 million people, the vast majority of them in southwestern
Nigeria. There are also approximately a half million Yoruba speakers in Benin,
as well as speakers in Togo and Ghana and among the emigrant populations in
the United States and the United Kingdom. In addition, roughly two million people
in Nigeria speak Yoruba as a second language.
The Yoruba language diaspora is wide, stretching from southwestern Nigeria
and Benin westward to the Caribbean and islands along the southeastern United
States coast. Yoruba and other African dialects arrived in the Americas and
the Caribbean as a consequence of the Atlantic slave trade. Throughout the region,
Yoruba dialects blended with each other and with languages like Spanish and
French to form a variety of creoles such as Gullah in the United States and Nagô
in Brazil. Many of those creoles have become the language of liturgy and music
in Cuba, Brazil, Argentina, Trinidad, Jamaica and parts of the United States
and Canada. The ultimate goal of this dictionary is to provide coverage for
all Yoruba dialects across the globe. For that reason, it will continue to be
a work in progress.
The current standard orthography is tone-driven. Yoruba has three tones: a
high tone, a middle tone and a low tone. Each syllable in a Yoruban word must
have at least one tone and long vowels may have two tones. While there are no
explicit rising or falling tones, combinations of the language's three basic
tones may produce the same effect. Grammatically, Yoruba is a Subject-Verb-Object
(SVO) language. Verbs have no infinitive forms, past or present tense and typically
have only a single syllable. Discrete auxiliary words provide information on
the verb tense. Nor do Yoruba nouns have plural or singular form; their number
derives from the context in which the word occurs.
The Yoruba dialect continuum consists of over fifteen varieties, with considerable
phonological and lexical differences among them and some grammatical ones as
well. Peripheral areas of dialectal regions often have some similarities to
adjoining dialects. Standard Yoruba is a koine used for education, writing,
broadcasting, and contact between speakers of different dialects. It is also
called "Literary Yoruba", "common Yoruba", or simply "Yoruba" without qualification.
Though in large part based on the Ň̩yň̩ and Ibadan
dialects, it incorporates several features from other dialects and has a simplified
vowel harmony system and some other features not found in other Yoruba dialects.
Data
This release encompasses the following languages and dialects:
| Languages |
Description |
Number of words |
Yoruba->English
|
This dictionary of Standard Yoruba contains detailed lexicographic
entries which include the part of speech, the English definition of the
Yoruba headword, cross references, examples in English and the morphemic
decomposition of the Yoruba headword. |
142,389 |
English->Yoruba
|
This dictionary maps the English headword back to Standard Yoruba and
includes the part of speech, Yoruba definition, and morphemic decomposition
of the Yoruba word. |
226,585 |
Gullah->English and Yoruba
|
Gullah is a creole spoken in the coastal Low Country of South Carolina
and Georgia in the United States. Although the language is no longer spoken
to a great extent, its words are still commonly used for personal names
and nicknames. The dictionary translates from Gullah headwords to English
and to Standard Yoruba. |
3,636 |
Lucumí->Spanish, English and Yoruba
|
Lucumí is the ritual language of the Santeria religion practiced
in Cuba. The Lucumí dictionary translates from a Lucumí headword
to Cuban Spanish to English to Standard Yoruba. At the time of this publication
in 2008, some entries do not have complete translations and only map from
Lucumí to Cuban Spanish. |
8,075 |
Trinidadian->English and Yoruba
|
Trinidadian is a creole which blends English, French, Spanish and African
languages. The Trinidadian dictionary presents those words that have Yoruban
roots and maps from the Trinidadian headword to English and Standard Yoruba. |
1,187 |
The dictionaries in this publication are presented in two formats, Toolbox
databases and XML. Short for The Field
Linguist's Toolbox, Toolbox is a lexicographical database system published
by SIL. SIL makes Toolbox freely available
for download.
In order to use the Global Yoruba Lexical Database v. 1.0, Toolbox must first be installed
on the user's local computer. The orthography of the text in the databases conforms to that presented to
students in the Nigerian school system. The basic Yoruba alphabet is:
a b d e e̩ f g gb h i j k l m n o o̩
p r s s̩ t u w y
The letter gb is a digraph, two letters that combine
to form a single phoneme. In written Yoruba, gb functions as a single
letter. In the Toolbox presentation, this has been taken into account and the
software sorts the words accordingly in all functions. The XML presentation
has been sorted according to the above alphabet but is a static, flat file.
For that reason, developers creating applications from the XML files will need
to take into account the digraph when writing searching and reporting functions.
As Yoruba is a tonal language, the written language uses additional diacritic
marks to denote tones. The orthography uses three tones:
- Low: denoted with a grave symbol (`) as in ŕ
- Mid: plain letter without diacritics
- High: denoted with an acute (´) symbol as in á
Both the Toolbox and XML presentations encode the text in Unicode UTF-8 using
normalized form
C. Unicode normalized forms govern the order in which letters and characters
are composed and processed by software systems. Normalized form C is the standard
form used by most web systems and is a W3C standard for the web. The Toolbox
presentation uses the Aria Unicode MS font for display. The Tahoma and Lucida
Grande fonts will also display the Yoruba alphabet under UTF-8 encoding. Since
XML only provides information about document structure, fonts are not specified
in the XML versions of the dictionaries.
Displaying non-Western letters: Windows users will need to install and configure their computers for Extended Language support. To do this, open the Windows Control Panel and click the Regional and Language Options icon. In the Regional and Language Options window that opens, select the Languages pane. Under the Supplemental Language Support section, check both check boxes and click okay. Windows will as for your install disc and will install the modules needed to properly display complex and non Western letters. If users do not have their Windows install disc, they should contact their local system administrator to install Extended Language Support.
Samples
For an example of the data in this database, please review this sample entry (jpg) from the Yoruba-English Lexicon.
Contact Informaton
All questions and inquiries should be directed to the author, Yiwola Awoyale at awoyale@ldc.upenn.edu
Content Copyright
Portions © 1990-2008 Trustees of the University Pennsylvania |