New Open Data Memorandum almost defines open data, misses mark with open licenses

TL:DR: The new E.O. and memorandum are good for transparency and lock in almost all of the generally accepted notions of open government data. But it misses the mark on the requirement of “open licenses.”

With an executive order and a new Memorandum on Open Data Policy today, the focus on entrepreneurship remained at the forefront of federal data policy. This focus began with last year’s Digital Government Strategy, and these days weather data and GPS signals are the examples of choice. That said, the policies set in the new memorandum are quite good for the classic use of this data (transparency, accountability, and civic education) even if “transparency” is only barely mentioned in passing.

Defining Open Data: How well does it do?

This new Open Data Memorandum presents the most detailed definition to date of “open data” by the federal government. It included many of the principles that our community has reached consensus on, but it gets one severely wrong.

As I wrote many years ago, the 2009 Open Government Directive itself already adopted some of the principles of open government data including: online, primary, timely, public input, and public review. It also added two principles of its own: being pro-active about data release and creating accountability by designating an official responsible for data quality.

Comparing to my list of open government data principles in my book, the new memorandum’s definition of open data covers:

  • Principle 1: Information should be online (to quote the Memorandum: “retrieved, downloaded”)
  • Principle 2: Primary (the Memorandum even uses language from the 8 Principles; interestingly the memorandum places this under the heading of “Complete,” which was a different principle from the original 8 Principles).
  • Principle 3: Timely.
  • Principle 4: Accessible (the Memorandum repeats the language from the 8 Principles, “available to the widest range of users for the widest range of purposes” and the use of “multiple formats” where necessary, and for documentation says the data should be “described”).
  • Principles 5 and 10: Analyzable (“machine readable”).
  • Principle 6: Non-discriminatory
  • Principle 7: Non-proprietary (open) data formats
  • Principle 14: Public review (“A point of contact must be designated to assist with data use and to respond to complaints about adherence to these open data requirements.”)

Its definition also states that open data has a presumption of openness. (Principles 2-7 and 14 are from the 8 Principles of Open Government Data. Principle 1 is from the Sunlight Foundation.)

Elsewhere in the memorandum it addresses:

  • Principle 13: Public input (“engage with customers” for prioritizing what data should be made available and how to make it available)
  • Principle 15. Interagency coordination (“interoperability”)

It also asks agencies to create data catalogs to include datasets “that can be made publicly available but have not yet been released” at agency.gov/data URLs. And it says agencies must consider the needs of open data at all stages of the information collection lifecycle. In other words, data should be collected in such a way as to promote public dissemination of open data later on.

The Memorandum misses the principle that data should be license-free, which is a core principle and a grave mistake. It also misses the peripheral principles of permanence, the use of safe file formats, and practices of provenance and trust (e.g. digital signatures). (These last two are ACM principles.)

“Open licenses” presume access is closed by default!

Rather than requiring open data to be license-free, which was a core part of the 8 Principles of Open Government Data, it instead promotes the use of “open licenses.” This is a subtle but important distinction. Licenses presume data rights. Open licenses, including open source licenses and Creative Commons licenses, create limited privileges in a world where the default is closed. These licenses create possibilities of use that do not exist in the absence of the license because copyright law, or other law, creates an initial state of closedness.

Most open licenses only grant some privileges but not others, and some privileges come along with new requirements. The GPL and Creative Commons Attribution License, for instance, rely on copyright law so that restrictions on data use intended by the open license (GPL’s virality clause, or the restriction that users must attribute the work to the author) are enforceable in court.

Federal government data is not typically subject to copyright law, and in this case a license is not needed for the data to be open. Thus the application of a license suggests a change from the open-by-default state of this data to a closed-by-default state where a license is required to open it up. While the memorandum requires “an open license that places no restrictions on their [the dataset's] use,” the term “open license” is typically understood to presume a default closed state. This policy opens the door (so to speak) to agencies applying licenses (i.e. new contractual agreements) to data that serve only to restrict use.

Federal government data not subject to copyright cannot be free if a license is applied. The license-free principle of the original 8 Principles says open government data cannot be limited in this way.

When data may be subject to copyright protection (copyright law is murky and there are many gray areas), or when copyright law definitely applies (such as to documents produced originally by federal government contractors), then a public domain dedication such as the Creative Commons CC0 statement or the Open Data Commons Public Domain Dedication and License (PDDL) (both of which combine a waiver and a license) is appropriate. A public domain dedication differs from an open license in that it disclaims copyright and other protections, whereas, again, an open license implies that such a limitation on use is already present. The CC0 statement was successfully used by the Council of the District of Columbia to disclaim copyright over data files containing the DC Code.

What’s the definition used for?

While the definition of open data is otherwise quite strong, the definition is used just once in the whole memorandum. The memorandum does not mandate that government data be open data under its definition, at least as far as I could see. The only use of the open data definition is in its request for agencies to create roles for staff to ensure data released to the public are open. That is, staff should promote open data, but open data itself is not required.

Although the definition itself is not used much, there are independent provisions that repeat some of the same principles. Agencies must use “machine-readable and open formats,” existing standards, and metadata. And information collection should be done in a way to support information dissemination: “[A]gencies must design new information collection and creation efforts so that the information collected or created supports downstream interoperability between information systems and dissemination of information to the public.”

It also requires the use of open licenses:

“Agencies must apply open licenses, in consultation with the best practices found in Project Open Data, to information as it is collected or created so that if data are made public there are no restrictions on copying, publishing, distributing, transmitting, adapting, or otherwise using the information for non-commercial or for commercial purposes.”

As I mentioned, federal-government-created data needs no license to be open, although the memorandum implies that all agency data should have an open license. (That’s either legally impossible or it means something usual.) For other data, it appears that the memorandum intends to create a public-domain-like state. But it is qualified, for contracts may only use “existing clauses” (i.e. standard contract terms already approved by OMB) to implement terms of open licensing. Looking over those terms, I don’t see the necessary legal framework to do it. And a nearby footnote confusingly says that a data user who modifies the data “is responsible for” describing the change. Does that mean an “open license” can require users to describe modifications? The qualifications make it very difficult to know what an acceptable implementation of open licensing looks like.

Conclusion

While the goals of the Memorandum in defining open data and using open licenses are laudable, the implementation does not meet the 8 Principles’s requirements of open government data, at least under the usual understanding of “open license,” and the use of the definition to promote open data is very limited.

PS. As Derek Willis points out over Twitter, the “mosaic effect” paragraphs in the memorandum are also somewhat concerning. The mosaic effect is hard to quantify and therefore difficult to limit, and this creates a big hole for keeping data government out of public reach.

UPDATE 5/10/2013 #1:

Rufus Pollock points out that the Open Data Commons Public Domain Dedication and License (PDDL) is similar to CC0 and would also be appropriate. I agree.

Eric Mill notes that for data already in the public domain, the Creative Commons Public Domain Mark, which is basically an icon/badge, would be appropriate. Agencies should definitely mark public domain data as such.

UPDATE 5/10/2013 #2:

I added a few paragraphs to the section now called “What’s the definition used for?”.

4 Responses to “New Open Data Memorandum almost defines open data, misses mark with open licenses”

  1. [...] pull requests to merge? @waldojaquith @philipashlock”. Tauberer wrote in greater detail on his personal blog about the memo: TL:DR: The new E.O. and memorandum are good for transparency and lock in almost [...]

  2. [...] about long-term preservation. He was also concerned, as is Joshua Tauberer, about the use of open licenses. Said Jacobs, “Open licenses presume access is closed by default. That’s obviously a problem [...]

  3. [...] project because the database was designed to be searchable (read Joshua Tauberer for more about keeping government data accessible!!! Then read this more recent post as well.) BUT here are a few sites I found that I thought would [...]

  4. [...] executive order on open government data and cited Govtrack.us‘s Josh Tauberer’s analysis of the Executive Order as missing the mark and being confused — if not downright misleading [...]