Language Archive Survey Results


1. Name and Location

Archive Name: Digital Dictionaries of South Asia
Archive URL: http://dsal.uchicago.edu/dictionaries
Host Institution: University of Chicago, Columbia University, Triangle South Asia Consortium in North Carolina
Country: USA
Contact Person: Rebecca Moore
Email Address: rebecca@dsal.uchicago.edu


2. Catalog

2.1 If the archive has a catalog in a standardized format, what fields does it contain? If not, what contextual information about the resources are collected? What other information would you like to collect if you could?
We collect basic bibliographic information: title, uniform title, author, publisher, date, pagination, etc

2.2 If the electronic catalog conforms to some standard, please tell us the name of the standard.
We use metadata (mostly DC), the resources are also cataloged in MARC

2.3 To what extent have the archived materials been cataloged electronically?
virtually everything

2.4 If there is an online public access catalog, please give its URL.


3. Holdings

3.1 What geographical regions and languages are covered?
Main Regions Covered: Asia
Approx Number of Languages: 30
Main Languages: All of the major literary languages of South Asia

3.2 Please give impressionistic estimates of the archive holdings for each of the data types.
DATA TYPE NON-DIGITAL DIGITAL
Texts:
Wordlists, Vocabularies, Lexicons, Dictionaries: large large
Field Notes, Correspondence, Misc files:
Descriptions (Grammars, Phonologies, etc):
Audio Recordings:
Video Recordings:

3.3 Please list any other data types which are not included above, or any other comments on the archive holdings:
There is another closely aligned project, Digital South Asia Library, this project deals with texts, grammars, journals, monographs and images. We have begun experiments with sound files for some texts

3.4 What proportion of the holdings are unique to the archive and not available elsewhere?
none


4. Electronic Publication

4.1 To what extent are the archive holdings published electronically, where "published" means that there is a well-defined procedure such that anyone at all can get a standard copy of the data, either on digital media or over the internet?
virtually everything

4.2 To what extent are the archive holdings accessible over the web?
virtually everything

4.3 Is permission required before materials can be accessed?
no

4.4 Is there any fee for materials?
no

4.5 How are author and/or editor defined for the electronic publications? Is there a bibliographical citation method?
We are converting dictionaries that exist only in print. The editor/author is assumed to be the same for the digital version and the print version. The work we do for the electronic version is apparent through the interface (ddsa logo, navigation bar, etc). The metadata (Dublin core) also includes the project as a contributor.

4.6 Do the electronic publications have ISBN numbers?
no

4.7 What plans are there to expand the electronic publication of archive holdings?
The project proposal for ddsa stated that we would convert dictionaries for each of the modern literary languages of South Asia. Additional funds have been secured for classical languages. We are applying for funds for minority languages.


5. General Issues

5.1 Who is the legal owner of archived materials?
We negotiate for electronic distribution for print materials under copyright. The nature of the project is free and open distribution, in this way I think that the archive is owned by the community of users, legally I imagine that the archive is owned by the institutions named in the grants.

5.2 Beyond legal ownership, are there any asserted or perceived moral rights concerning archived materials? Do the holders of the archive see the original speakers or their representatives as controlling publication?
The focus of our project is free and open access to important lexical texts. In the case where connectivity is a problem we are working out mechanisms to transfer the data to users on a cost recovery basis (the cost of CDs). Without alternative distribution methods we might have moral issues concerning use of the data by users in Southern Asia. We have been very aware of the ethical issues of publishing data in a high bandwidth mode.

5.3 In cases where no electronic publication is planned, why is this so? (e.g. funding, licensing, technical know-how, lack of interest).

5.4 Is any of the data in a proprietary format (e.g. MS Word)? If so, are there plans to transfer it to an open standard (e.g., XML)?
We use Unicode as the encoding system for web display. The structure of the data is in a format that we make up to address the issues of each dictionary and to disambiguate the data so that we can process the data into a usable form. (copies of the specifications are available in PDF form, http://dsal.uchicago.edu/workspace/)


6. Do you have any other comments about digital archives of language material, or on this survey?
I would like to add a brief note about the DDSA project. We are working on a number of dictionaries at the present time (we have data in hand for 16 dictionaries and have 10 being processed, and more on the list), however, only a few are in public view. The information given above was written with the entire project in mind, not only the portion currently viewable.



Back to the index page