White Paper on Machine-Readable Rights Information

Steven Bird, University of Pennsylvania
http://www.ldc.upenn.edu/sb/
sb@ldc.upenn.edu

October 2001

This paper was prepared for the Technical Committee of the Open Archives Initiative.


Problem statement: Currently there is no definition for mechanisms to enforce IP rights beyond that a human being needs to read the rights statements packaged with every record. This is not scalable. Can we define a simple vocabulary for rights management that will not be overly restrictive but at the same time will enable more archives to participate without the all-or-nothing approach?

Note: The rights discussed in this document are rights over the content of a repository which serves up its metadata using the OAI Protocol for Metadata Harvesting. Rights over the protocol itself, or the full content referenced by a metadata record, are not in focus here.

Background

This topic grew from a series of postings on the OAI-General mailing list with the subject line OAI and intellectual property issues. Terry Kuny posted the original message, and there were followup messages from Jose Luis Borbinha, Thomas Krichel, Michael Nelson, Thorsten Schwander, Bill Nicholls, Herbert van de Sompel, Mohammad Zubair, Garey Mills, Steve Hitchcock, and Hussein Suleman. A single file containing the postings, with headers trimmed, and some light annotation, is available (rights-discussion.txt). This white paper makes heavy use of the ideas raised by these people. I am grateful to the following people for commenting on an earlier version of this white paper: Michael Nelson, Hussein Suleman, Bill Nicholls, Andy Powell and Thomas Krichel (rights-feedback.txt).

The status-quo. The OAI Protocol for Metadata Harvesting does not address issues of acceptable use, leaving it as a problem for implementers to address. However, the protocol provides containers at both the repository and record level which may be used to hold statements about terms and conditions. It is up to service providers to respect the conditions set forth by the data providers, and for data providers to take action when they believe these conditions have been violated.

The Debate

Arguments for preserving the status-quo. First, the OAI does not want to adopt a stance in the IP debate, since this could alienate certain subcommunities and limit the uptake of the protocol. (The OAI is not a digital archives version of the Free Software Foundation.) Second, the verbs and response headers are quite independent of their metadata payload, and it makes no sense to elevate this particular content issue to the protocol level. Third, the OAI risks being drawn into an OSI-type role of vetting IPR legalese for conformance with someone's notion of what it means to be open. Finally, IPR discussions are frequently open-ended and divisive, there are many reasonable positions that can be adopted, and the results are at best tentative (until legally tested) and at worst unenforceable. Mixing metaphors, this is a can of worms the OAI shouldn't touch with a bargepole.

Arguments for changing the status-quo. First, metadata is the intellectual property of its author (even if this is not asserted in a rights statement) and the author enjoys various rights as a consequence (e.g. to be identified as the author). The service provider has no mechanical means of discovering which of these rights the author wishes to retain. Second, some repositories do provide a prose statement about acceptable use. A survey of the current statements (see the Appendix) reveals considerable diversity of expression, and a service provider can have no mechanical means of processing these. Third, some archives explicitly state that commercial use is permitted, while others state that it is not permitted. This situation makes it harder for service providers to draw any conclusion about archives which do not provide an explicit policy. Finally, in spite of the diversity of expression, there is only very limited variety of content in the existing metadata policy statements (i.e. permitting only non-commercial/educational use and/or requiring acknowledgement of sources), and it would be easy to provide a short list of common policies, referenced using a controlled vocabulary. In summary, there is already an IP issue, and the OAI can address it with some simple standards.

Principles on which to base a solution. Paradoxically, both sides of this debate are arguing for simplicity. Those for the status quo argue that the protocol should be kept as lightweight as possible. Those for changing the status quo want to make the job of implementers as easy as possible by making the rights information machine-readable. Another issue, namely ``openness'', is probably a distraction. Some people think that metadata should be a freely available finding aid for anyone at all (openness_1) while others think that the initiative itself should adopt no position here, and thereby transcend the business models of its participants (openness_2). Given that these competing notions of simplicity and openness don't militate either way, I think we should be looking for a pragmatic solution which tries to strike a balance between all of the concerns.

Before doing this, I would like to identify a range of other issues that could arise in the context of metadata policies.

A Survey of Intellectual Property Issues Relevant to Metadata

A wide range of issues have been faced by the open source community, and it is instructive to look at these and to consider whether some of them may carry over to the open archives community. This list is very incomplete.

IssueSoftware ExampleArchive Correlate
Forks with lockout: A takes B's code, modifies the API slightly, adds new functionality, does a commercial release, and ``steals'' B's clients. A takes B's metadata, cleanses or enriches the data, sells a service, and refuses to let B (or B's community) have the improved data.
Stealing credit: A takes B's code then makes a few changes and renames it as A's system, giving no credit to B. A takes B's metadata, puts their own name on it, and registers a new data provider with the same content (or a new service provider), not crediting B.
Brand name: A takes B's code, changes it, and distributes the result as if it was still B's. A takes B's metadata, messes with it, and provides a different service, using (and tarnishing) B's respected name.
Claw-back: A builds a commercial product linked to B's code. B later changes the license and holds A hostage. A takes B's metadata, sells a popular commercial service, and B subsequently modifies the rights statement to exclude commercial use, and threatens to sue A if A doesn't remove B's data from the service.
Liability: A distributes B's program, not realizing that a third party patent license is required from C first. C sues A, and A holds B liable. A takes B's metadata, which B had collected from many sources including C. A sets up a commercial service. C sues A for violating its IPR, and A holds B liable since B didn't transmit C's IPR statement correctly.

A Proposal

I propose that the OAI establish a system for metadata licenses. The OAI would host a small set of popular licenses. Data providers could point to these or to licenses residing elsewhere. All OAI metadata records would cite one or more licenses. (In the case that more than one identifier was cited, a service provider would be free to choose from the list.) A repository's header would list all the license identifiers used by the metadata records they contain. There would be no default value.

A small number of metadata licenses would be created now, based on experience with existing data providers. Minimally, we need a license permitting any kind of use, commercial or non-commercial, and a license only permitting non-commercial/educational use. Both licenses may have a common baseline, e.g. requiring acknowledgement.

Note that there would be a strong disincentive for licenses to proliferate. At any given time, a service provider will recognize some set of licenses. When adopting a new license, a data provider puts itself at a disadvantage, since it must wait for service providers to get around to reading the conditions. However, if a large data provider, or a group of small data providers, adopted a new license, it would be in the interest of service providers to read the conditions and check if they can comply with them. In this way proliferation would be limited.

Some Implementation Ideas and Issues

ODRL. The Open Digital Rights Language (http://odrl.net/) provides a standard vocabulary for expressing terms and conditions over assets. ODRL gives us a ready syntax (defined using XML schemas) in which rights statements may be expressed. Here is an example of an ODRL rights element, from a DC example at http://www.ukoln.ac.uk/metadata/resources/dc/dc-xml-guidelines/. Such statements could be included at both the repository and record level.

 <oex:rights>
   <oex:asset>
     <oex:content>
        <odd:uid idscheme="URI">
          http://somewhere.com/frogmaths/
        </odd:uid>
     </oex:content>
   </oex:asset>
   <oex:permission>
     <odd:display/>
     <odd:copy/>
     <odd:modify/>
     <odd:annotate/>
   </oex:permission>
  </oex:rights>

(Aside: Note that the ODRL states that the standard itself ``has no license requirements and is available in the spirit of "open source" software.'' The OAI PMH should have a similar statement associated with it.)

Obligatory vs Optional. Would we want to require that all records in all data providers specify a policy? (I'd say yes, we really want people to be explicit about the policy they want; in the majority of cases a data provider won't care much, and it would be better to select a permissive policy than to say nothing.) Would we want OAI schemas to hardcode the current enumeration of valid policy identifiers? (I'd say no, since this requires too much centralization.)

Extending the Protocol. One approach to implementation is to actually extend the PMH, by treating metadata rights on a par with metadata formats. We would assign them names and schemas, add a verb ListMetadataIPR, and support interactions between rights/formats (e.g. getRecords with format=oai_dc and rights=oai2). However, this is probably overkill.

Legal Counsel. Setting up the initial licenses may require legal counsel. We could consider engaging Larry Rosen at the OSI. A starting point would be the open source licenses and open content licenses [www.opensource.org, www.opencontent.org]. We could pick some of these and ask him to adapt them for metadata.

Conclusion

The OAI already has an IP issue, as manifested in the diverse, non-machine-readable metadata policy statements. The limited range of content in these statements suggests that a small, extensible inventory of standard licenses would be a significant service to the community. The existence of the ODRL demonstrates that the machine-readable rights issue is already well-recognized, and that we do not have to create new infrastructure.

Moreover, taking this approach does not count as adopting a stance in the IP debate, merely helping service providers to discover the terms and conditions that repositories attach to their metadata. Without this, policy statements will proliferate (with each new data provider) while the task for service providers will grow, or else service providers will simply ignore the policies. Given the need to communicate machine-readable rights information, the above proposal seems to keep the protocol as lightweight as it can be.

Appendix: Existing Metadata Policy Statements

ArchiveMetadata Policy
ANLC Metadata may be used without restrictions as long as the OAI identifier remains attached to it.
arXiv text: Metadata harvesting permitted through OAI interface
URL: http://arXiv.org/help/oa/metadataPolicy
CogPrints
Formations
CogPrints metadata are freely accessible to all, and freely re-useable by all, under the following conditions: (1) The full name(s) of the author(s) and (2) the full bibliographic reference information for the paper (date, title, and, if published, journal, volume, and pages, if any) must always co-appear prominently with any re-use or redisplay of the metadata, in any medium. The metadata may not be offered for sale without the formal permission of the author and publisher. (Note: CogPrints is NOT the publisher; it is merely the online archive. Mention of Cogprints/Formations in re-use is appreciated but not mandatory.)
citebase Metadata can be used by commercial and non-commercial services, as long as any restrictions for individual records are observed (see the archive publisher).
DCHC Metadata may be harvested for non-commercial, educational purposes only. Permission required to harvest for any other purpose.
CSTC
DUETT
Metadata may be used by commercial and non-commercial users
EKUTubingen Metadata harvesting permitted through OAI interface
hsss text: Metadata may be used by commercial and non-commercial service providers.
URL: http://hsss.slub-dresden.de/hsss/oai/policy/metadata.html
tkn Metadata can be used by commercial and non-commercial service providers
VTETD Resources and materials available through the Digital Library and Archives, including Special Collections, are available for use in research, teaching, and private study. For these purposes, you may reproduce (print or download) materials without prior permission, on the condition that you provide proper attribution of the source in all copies. These resources and materials are not, however, in the public domain and copyright is largely held by the Digital Library and Archives, University Libraries, Virginia Polytechnic Institute and State University. For more information, visit http://spec.lib.vt.edu/policies/conditions.html