Showing posts with label creative commons. Show all posts
Showing posts with label creative commons. Show all posts

Tuesday, December 8, 2009

Linking Rights to Aggregations of Data (Part 2)

In my background research for today's entry I discovered that the smart people at Talis, especially Ian Davis, have been working the problem I outlined in Linking Rights to Aggregations of Data (Part 1). Specifically, back in July 2009 Ian proposed WAIVER: A vocabulary for waivers of rights. In Ian's words,

(The WAIVER) vocabulary defines properties for use when describing waivers of rights over data and content. A waiver is the voluntary relinquishment or surrender of some known right or privilege. This vocabulary is designed for use with the Open Data Commons Public Domain Dedication and License and with the Creative Commons CC-0 waiver

In his July 2009 post Linked Data and the Public Domain Ian argues for providers to unambiguously declare their datasets public domain and explains how to use the WAIVER vocabulary to do this, in the context of a voID description of a dataset. (See also this email discussion thread involving several of the thought leaders in this area on this issue) Ian provides the following example, which I repeat here to illustrate (a) use of voID to describe a dataset named "myDataset," (b) use of the wv:waiver property to link the dataset to the Open Data Commons PDDL waiver, (c) use of the wv:declaration property to include a human-readable declaration of the waiver, and (d) use of the wv:norms property to link the dataset to the community norms he suggests, ODC Attribution and Share-alike.


<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/terms/"
xmlns:wv="http://vocab.org/waiver/terms/"
xmlns:void="http://rdfs.org/ns/void#">
<void:Dataset rdf:about="http://myOrganisation.org/myDataset">
<dc:title>myDataset</dc:title>
<wv:waiver rdf:resource="http://www.opendatacommons.org/odc-public-domain-dedication-and-licence/"/>
<wv:norms rdf:resource="http://www.opendatacommons.org/norms/odc-by-sa/" />
<wv:declaration>
To the extent possible under law, myOrganisation
has waived all copyright and related or neighboring rights to
myDataset
</wv:declaration>
</void:Dataset>
</rdf:RDF>

WAIVER and OAI-ORE: As I proposed in Part 1, we should be able to combine the voID and OAI-ORE approaches. The only conceptual difference is by OAI-ORE guidelines the RDF file shown above would be treated as the resource map for the aggregation URI (in this example, "http://myOrganisation.org/myDataset") and would have a URI unto itself (perhaps "http://myOrganisation.org/myDataset.rdf").

What about other rights? It is critically important for the reader to understand that Ian's example (repeated above) only shows how to declare a waiver of rights, which by its nature is intended to promote the reuse of data based on open principles. Today, this is mostly what the linked data world has focused on, but as the NYTimes open data experiment is showing us, providers will want to assert rights where they can. In a future post I'll applied what we've learned so far, to consider approaches for declaring dataset rights in legal regimes where this is actually possible.

Monday, December 7, 2009

Linking Rights to Aggregations of Data (Part 1)

In my previous post Protecting your Linked Data I considered the dual questions of what legal regimes are available to linked data providers for the protection of their published datasets, and what technical frameworks and best practices exist especially within the realm of RDF and linked data to make such rights assertions. In this (shorter!) post I begin to consider an attribution scheme that comes to mind on the heels of discussions on the The New York Times Linked Open Data Community list, that of using named graphs (see also here) and specifically the OAI-ORE data model to associate specific rights to aggregations of resources.

What's the problem? Given a set -- an aggregation -- of data assertions, how might we properly assert rights over those assertions, especially in a way that a responsible client won't lose track of the ownership context? Lets assume a file of RDF triples is read into store. Consist with the NYTimes LOD discussion, we'll call the file people.rdf. Since "all RDF stores support named graphs these days" (Richard Cyganiak), a named graph URI shall be assumed to have been created and names the aggregation of assertions imported from "people.rdf" (i.e. the assertions in the file "people.rdf" from the provider become members of the named graph "people.rdf" in the client's RDF store.

Recall that a named graph is "a set of triples named by an URI." [ref] The OAI-ORE data model extends this with a set of guidelines for making assertions about aggregations that "describe" the named graph. ORE's core idea is to create one URI to represent the aggregation itself, and another to represent the resource map that we created to describe that aggregation. It should be in this OAI-ORE resource map that rights expressions applying to the aggregation should appear.

In my next post I'll take a stab a mocking up -- and hopefully not mucking up -- what an implementation of this might look like...

Thursday, December 3, 2009

Protecting your Linked Data

One of the highlights of the recent ISWC2009 was a tutorial on Legal and Social Frameworks for Sharing Data on the Web. As one who during the rise of "Web 1.0" was writing and presenting frequently on topics like Copyright for Cybernauts and is now seduced by the world of linked data, I've been considering how the legal, business and technical worlds will reconcile themselves in this new world, a world where value will come from joining networks of data together. Eric Hellman puts this nicely:

Linked Data is the idea that the merger of a database produced by one provider and another database provided by a second provider has value much larger than that of the two separate databases... Eric Hellmen, Databases are Services, NOT Content (Dec 2009)
The question is, what legal and technical strategies are available to a linked data provider to protect themselves as they pursue such a value proposition? The following post is an effort to try to rationalise this a bit more clearly.

I'm not a lawyer. I'm a technologist who has since the early 1990s immersed himself in the sometimes delicate, more often violent dance between technology, business and public policy that has been catalysed by the rise of the digital, networked environment. In particular I've been motivated by the question of how policies can, and more often can't, be systematically "implemented" by technologies --- as well as by the question of how technical architectures often enforce ad hoc policy regimes, inadvertently or otherwise (see esp. Lawrence Lessig's Code v2, the community update of Code and Other Laws of Cyberspace).

As an early (an perhaps idiosyncratic) player in the DRM industry, I quickly concluded that the only sustainable solution to the problem of communicating rights for creative works in the digital domain was to evolve an infrastructure of identifiers and metadata, which has been realised to a great extent by the rise in prominence of the DOI, accessible templates for rights communications (due in large part to Creative Commons), the emergence of a variety of metadata standards, and a standard data model (RDF) for associating metadata with objects. The more recent emergence of standards of practice for linked data will only help to further disambiguate the rights world, as these practices make the expression and transferral of content-descriptive metadata orders of magnitude easier.

I'm interested in questions concerning the communication of intellectual property rights for data shared through linked data mechanisms: What rights can be claimed? What are the best practices for claiming and transferring rights? What technical mechanisms exist --- in this case, specific vocabularies and protocols --- for communicating rights to metadata? The four thought leaders at the ISWC2009 LSFSDW tutorial have done a fairly complete job; this post is an attempt to summarise and/or interpret their messages and resources found elsewhere. I'd like to highlight pioneering work by the Science Commons, an offshoot of CC which has considered these questions specifically for scientific data. Also, in preparing this post I stumbled across some works that I poured over more than a decade ago, that now seem prescient! David Lanzotti and Doug Ferguson's thorough analysis circa 2006 shows that little has changed: IP protection for databases is nebulous territory.

Copyright does not apply to datasets: Most regimes hold that copyright applies only to original creative works. This means you can only claim copyright for works that are yours and which are "creative." This second piece means you cannot claim copyright on databases unless their structure and organisation is sufficiently creative; the US Supreme Court held that "sweat of the brow" is not sufficient to cross this threshold, and that copyright protections do not extend to non-creative accumulations of facts (c.f. Feist, 1991).

The individual elements of a dataset might themselves be extensive and creative enough to merit copyright protection; we'll assume for this discuss that these are handled separately. In their FAQ the Open Data Commons nicely emphasises the difference between a dataset and the individual contents of that dataset, including text and images. Note also that the European Space Agency (ESA) web site includes a nice, concise explanation of the legal reasons why copyright cannot be applied to databases.

Intellectual property protection for datasets: The fact that copyright (generally) cannot be applied to datasets means that the Creative Commons body of work can't be applied directly; indeed CC specifically discourages it. But is there an IP regime that covers accumulated data? If not copyright, patent or trademark, then what? ca. 1996 database "owners" thought that a sui generis ("of its own kind") regime for protecting databases might proliferate, and in March 1996 the EU issued a Database Directive. International IP law requires reciprocal directives from member states, however, and the lack of adoption of this model around the world and most notably in the United Sates means IP protection for datasets is still nebulous.

In principle there are no "default" protections for datasets as there are with copyright; providers must be proactive and declare their terms of use up front, whether they choose to waive all restrictions; a limited set focused on attribution; or more extensive limitations based on customised licenses. It is clearly in the interests of both providers and consumers of datasets to ensure that rights are explicit stipulated up front, especially since a key value proposition of linked data is (as we are reminded above) the merger of graphs; for certain applications graphs from difference sources must be merged together within a single store so that inference can be applied. A service agency must know up front whether triples from particular sources can be "thrown in the hopper," and even of there are exclusions.

Templates for expressing licensing terms: The Open Data Commons provides a template Open Database License (ODbL) that specifies Attribution and Share-alike Terms

This {DATA(BASE)-NAME} is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/
The specific text of the ODbL license is quite extensive, but the gist of it is nicely summarised in the ODbL Plain Language Summary:
You are free: To Share...To Create...To Adapt...
As long as you: Attribute...Share-alike...Keep open...
(details of each stipulation omitted for simplicity)

My point in dwelling on ODbL is not to argue that commercial providers should adopt it, but rather to consider adapting it; I'm holding it up as an exemplar for the explicit expression of terms of use for a dataset.

Expressing your rights to linked data as linked data: One of the things that has impressed me about Creative Commons is that its rights expressions were intended from the start to be modelled in RDF and machine-readable; indeed CC has created ccREL: the Creative Commons Rights Expression Language, which primarily uses the idea of embedded RDF (via RDFa) in content pages to communicate rights. A recent development is Creative Commons guidance on how ccREL and RDFa might be applied to "deploy the Semantic Web." Note that Nathan Yergler's (excellent) OpenWeb 2008 presentation explains this well, but doesn't specifically deal with the linked data question. Note that in particular Nathan addresses CC+, a CC licensing model that allows providers to include a way for users to request rights beyond those stated in the basic CC license. Those who know me know what I'll say next: this is another step forward as we converge on Henry Perritt's ca. 1993 vision of permissions headers!

For further reading: