Yesterday the White House’s chief technologists unveiled the Open Government Directive (OGD). The OGD mainly covers two aspects of government transparency: using technology as a tool for data sharing and public participation in agency decision-making. We’ve seen the start of a culture shift this year in the executive branch, in parallel with actual progress throughout the country, and now the OGD outlines and codifies a vision for the next four months and, well, beyond.
Last week I reviewed the House’s Statement of Disbursement electronic document along the dimensions of open government data (which unfortunately has the same abbreviation). The OGD talks about how agencies should go about the process of opening data. Here is a review of what the OGD says, organized by open data principle.
I’ll put the conclusion up front: The OGD addresses nearly all of the open government data principles that have been put forward, and even adds two of its own: being pro-active about data release and creating accountability by designating an official responsible for data quality (more on these at the end). So from this perspective, the OGD is pretty spot-on. It is very strong in public input, public review, and interagency coordination, which are normally the weakest spots of government data (but, on the other hand, this isn’t data, this is a goal, so the proof will be in the pudding). It could have been stronger in the areas of machine processability & promoting analysis, and explaining what is appropriate for data licensing (ideally, none).
Here are the details:
Information is not meaningfully public if it is not available on the Internet for free.
The OGD says “each agency shall take prompt steps to expand access to information by making it available online in open formats.” The OGD itself doesn’t say free, but executive branch policy already requires that public information not be sold to the public at more than the marginal cost of distribution — which is about as good as one might expect. So we’ll count this principle as asserted by the OGD.
Data Should Be Primary. Primary data is data as collected at the source, with the finest possible level of granularity, not in aggregate or modified forms.
In the OGD’s appendix where it outlines further details for agencies, it says agencies should release data “as granular as possible”.
The OGD days, “Timely publication of information is an essential component of transparency. Delays should not be viewed as an inevitable and insurmountable consequence of high demand.”
Accessible. Data are available to the widest range of users for the widest range of purposes, meaning use an open standard, with a bulk download, and with documentation.
Machine processable: Data are reasonably structured to allow automated processing.
The OGD specifically defines “open format” — which is the subject of the directive — as something that is platform independent and machine readable. Now, here the OGD slips a little because it redefines “open” but actually leaves out open standards. I don’t think that was intentional, so we’ll give the OGD credit for mentioning open standards even though it didn’t exactly. It mentions “downloadable” but not in bulk, and there is no mention of documentation in the OGD. We can’t tell what the OGD meant by “machine readable” — I think of this term now a sloppy form of “machine processable”. It would have helped if the OGD specifically noted that the point is to support analysis and reuse of the data.
I used to use “machine readable” until someone corrected me that really any format can be read by a machine. The question is what the machine can do with it: to what degree can the data be meaningfully processed by a machine? So now I use machine-processable.
Non-discriminatory: Data are available to anyone, with no requirement of registration.
Non-proprietary: Data are available in a format over which no entity has exclusive control.
License-free. Dissemination of the data is not limited by intellectual property law or other terms.
The OGD says data must be “made available to the public without restrictions that would impede the re-use of that information.” Here we could have really benefited from some simple but concrete guidance.
Promote analysis: Data published by the government should be in formats and approaches that promote analysis and reuse of that data.
There is a sense in which this is implicit in the OGD, but maybe it is the goggles through which I read it. The OGD fails to say explicitly that analysis is the whole point of open government data.
Public input: The public is in the best position to determine what information technologies will be best suited for the applications the public intends to create for itself.
Public review: There should be a means for the public to interact with the data publisher during and after the data has been made. The public may have questions or may find errors. The process of creating the data should also be transparent.
These principles are perhaps the least commonly addressed, and yet it is one of the most prominent aspects of the OGD. The OGD requires agencies to allow the public to give feedback on data quality, data prioritization, and other aspects of the agency’s OGD plan. In fact, the OGD says, “Each agency shall respond to public input received on its Open Government Webpage on a regular basis.”
In addition, the OGD will form a working group (described next) that will discuss “ideas to promote
participation and collaboration, including how to … take advantage of the expertise and insight of people both inside and outside the Federal Government, and form high-impact collaborations with researchers, the private sector, and civil society.”
In the appendix where it outlines further goals for agencies, the OGD says, “Your agency should also identify key audiences for its information and their needs, and endeavor to publish high-value information for each of those audiences in the most accessible forms and formats.”
Interagency coordination: Interoperability makes data more valuable by making it easier to derive new uses from combinations of data. To the extent two data sets refer to the same kinds of things, the creators of the data sets should strive to make them interoperable.
The OGD will establish a working group lead by the Deputy Director for Management at OMB, the Federal Chief Information Officer, and the Federal Chief Technology Officer to provide a forum to share best practices for data collection, aggregation, validation, and dissemination throughout the government, to coordinate implementations of federal spending transparency, and to provide a forum for sharing best practices for participation.
Provenance and trust: Published content should be digitally signed or include attestation of publication/creation date, authenticity, and integrity.
Permanent Web Address: The file should have a stable location.
Safe file formats: Government bodies publishing data online should always seek to publish using data formats that do not include executable content.
Globally Unique Identifiers: This concept, important on the world wide web, is that any document, resource, data record, or entity mentioned in a database, or some might say every paragraph in a document, should have a unique identification that others can use to point to or cite it elsewhere.
Linked Open Data: This is a method for publishing databases in a standard format for interconnectivity with other databases without the expense of wide agreement on unified inter-agency or global data standards.
These get into some of the more precise details of data format. I might have liked to see provenance & trust addressed, but I am not sure whether I would really expect these principles to be included in a high level 120-day plan, at least not at this point. So their absence is not something I hold against the OGD. Still:
The OGD talks about being proactive with data release.
The OGD also adds accountability: “Each agency … shall designate a high-level senior official to be accountable for the quality and objectivity of, and internal controls over, the Federal spending information publicly disseminated.”