A Survey of Open Language Archives

By Open Language Archive we mean a digital repository of language data, documentation and description, including texts, recordings, dictionaries, grammars and field notes, where there is an intent to make the materials openly available. "Open Language Archive" is construed broadly, to include any such repository which has an accessible digital component, even if it is just an online catalog or a few digital holdings. It also encompasses organizations which publish language data in electronic form. (NB. Our use of "open" is inspired by the Open Archives initiative.)

Examples of the kinds of archives we have in mind are listed here: Archives for Language Documentation and Description.

The purpose of this survey is to identify:

  1. all existing digital archives of language materials, the nature of their holdings, and any electronic publication activities (whether digital media or web-based);
  2. all planned digital archives of language materials and their present status;
  3. any technical, legal or moral obstacles particular to archives of language materials.

Additionally, we would like to find out about the metadata (i.e. catalog fields) used for classifying archived language materials.

Preliminary survey results are available here.

Please respond to as many questions as possible, especially to the questions in red. You can submit a survey response for an archive whether or not you are the maintainer. Omit questions that are irrelevant or too time-consuming. Please pass the survey on to any other archivists, librarians and language specialists. Thank-you!


1. Name and Location

Archive Name:
Archive URL:
Host Institution:
Country:
Contact Person:
Email Address:


2. Catalog

2.1 If the archive has a catalog in a standardized format, what fields does it contain? If not, what contextual information about the resources are collected? What other information would you like to collect if you could?

2.2 If the electronic catalog conforms to some standard, please tell us the name of the standard.

2.3 To what extent have the archived materials been cataloged electronically?
no electronic catalog a small amount a significant amount virtually everything

2.4 If there is an online public access catalog, please give its URL.


3. Holdings

3.1 What geographical regions and languages are covered?
Main Regions Covered: Africa, Americas, Asia, Europe, Oceania
Approx Number of Languages:
Main Languages:

3.2 Please give impressionistic estimates of the archive holdings for each of the data types.
DATA TYPE NON-DIGITAL DIGITAL
None Small Large None Small Large
Texts:
Wordlists, Vocabularies, Lexicons, Dictionaries:
Field Notes, Correspondence, Misc files:
Descriptions (Grammars, Phonologies, etc):
Audio Recordings:
Video Recordings:

3.3 Please list any other data types which are not included above, or any other comments on the archive holdings:

3.4 What proportion of the holdings are unique to the archive and not available elsewhere?
none a small amount a significant amount virtually everything


4. Electronic Publication

4.1 To what extent are the archive holdings published electronically, where "published" means that there is a well-defined procedure such that anyone at all can get a standard copy of the data, either on digital media or over the internet?
nothing published a small amount a significant amount virtually everything

4.2 To what extent are the archive holdings accessible over the web?
nothing accessible just some samples a significant amount virtually everything

4.3 Is permission required before materials can be accessed?
no sometimes often virtually always

4.4 Is there any fee for materials?
no sometimes often virtually always

4.5 How are author and/or editor defined for the electronic publications? Is there a bibliographical citation method?

4.6 Do the electronic publications have ISBN numbers?
no sometimes often virtually always

4.7 What plans are there to expand the electronic publication of archive holdings?


5. General Issues

5.1 Who is the legal owner of archived materials? The original collector or his/her estate? The language community? The archive or its host institution? Some combination of these?

5.2 Beyond legal ownership, are there any asserted or perceived moral rights concerning archived materials? Do the holders of the archive see the original speakers or their representatives as controlling publication?

5.3 In cases where no electronic publication is planned, why is this so? (e.g. funding, licensing, technical know-how, lack of interest).

5.4 Is any of the data in a proprietary format (e.g. MS Word)? If so, are there plans to transfer it to an open standard (e.g., XML)?


6. Do you have any other comments about digital archives of language material, or on this survey?


Many thanks for taking the time to complete the survey. Please submit it using the button on the left below.

              


Steven Bird, LDC
sb@ldc.upenn.edu
Gary Simons, SIL
Gary_Simons@sil.org
Mark Liberman, LDC
myl@unagi.cis.upenn.edu