Malto Speech and Transcripts was developed by Masato Kobayashi, Associate
Professor in Linguistics at the University of Tokyo (Japan), and Bablu Tirkey,
research scholar at the Tribal and Regional Languages Department, Ranchi University
(India). It contains approximately 8 hours of Malto speech data collected
between 2005 and 2009 from 27 speakers (22 males, 5 females). Also included are accompanying
transcripts, English translations and glosses for 6 hours of the collection. Speakers were
asked to talk about themselves, their lives, rituals and folklore; elicitation
interviews were then conducted. The goal of the work was to present the current
state and dialectal variation of Malto.
Malto is a Dravidian language spoken in northeastern India (principally the
states of Bihar, Jharkhand and West Bengal) and Bangladesh by people called
the Pahariyas. Indian census data places the number of Malto speakers in a range
of between 100,000-200,000 total speakers. Most Malto speakers live in the three
northeastern districts of Jharkhand, i.e, Sahebganj, Godda and Pakur; the fieldwork
that resulted in this corpus was conducted in those districts. Of the Pahariyas
in that area, three subtribes, the Sawriya Pahariyas, the Mal Pahariyas and
the Kumarbhag Pahariyas, primarily speak Malto. (Kobayashi 3)
Pahariya villages or hamlets are located on hilly tracts and in the lowlands
are often separated by non-Parahiya villages. As a result, Malto varies from
village to village. It may be more accurate to consider Malto a continuum of
dialects rather than a unitary language. The three major dialects -- Sawriya
Pahariya, Mal Pahariya, and Kumarbhag Pahariya -- correspond to the principal
sub-tribal communities. (Kobayashi 14)
For further reading on Malto, consult Texts and Grammar of Malto (2012) by Masato Kobayashi published by Kotoba Books, Vizianagaram, India and sold by the book distributors: Mary Martin Booksellers, 123 Third Street, Tatabad, Coimbatore 641012, India. They can be contacted at
email@example.com or at firstname.lastname@example.org.
The transcribed data accounts for 6 hours of the collection and contains 21 speakers
(17 male, 4 female). The untranscribed data accounts for 2 hours of the collection and contains
10 speakers (9 male, 1 female). Four of the male speakers are present in both groups.
All audio is presented in .wav format. Each audio file name includes a subject
number, village name, speaker name and the topic discussed. The transcripts
and glossary are UTF-8 text files. Because of ambiguities that occur when writing
Malto in Devenagari script, the transcripts were developed using Roman script
with symbols adapted from the International Phonetic Alphabet (IPA) but are
not considered to be phonetic transcripts.
Consult readme.txt and
for further information
about the corpus, its collection and the speakers. The transcription and glosses
are split into three text files; consult the readme
to determine which audio files are covered by each transcript.
For a sample from this corpus, please listen to this
Some minor updates were made to the index file. An updated version
is available in the online documentation folder, as well as an updated file table.
Kobayashi, Masato. Texts and Grammar of Malto.
Vizianagaram: Kotoba Books, 2012. Print.
Portions © 2005-2012 Masato Kobayashi, © 2012 Trustees of the University