Bulk data downloads approved in the omnibus spending bill (success!)

Two recommendations of the Open House Project report have been taken up in the FY09 omnibus appropriations bill (H.R. 1105). The first recommendation in our chapter on legislative databases was that the Library of Congress make its bill status database directly available to the public and that the GPO not sell legislative documents to the public. These have been the two issues I’ve had my sights on over the last three years (probably starting here). The second recommendation was about coordinating web standards across Congress. These recommendations are addressed in two paragraphs the House statement accompanying the bill for Division G – Legislative Branch, which is almost like being law itself.

The two paragraphs were added by Congressman Mike Honda of California, one of our champions of the use of technology to further transparency and civic engagement. John Wonderlich of Sunlight Foundation, Rob Pierson in Honda’s office, and I collaborated on this over a long period of time. Honda got involved in 2007 asking the Library to look into this and then in 2008 getting the paragraphs added to the bill markup.

So here they are:

Congressional Technology Coordination.-The House of Representatives needs a strategic and coordinated plan that will prepare for the future technology needs of the institution. A 2006 report commissioned by the Chief Administrative Officer and the Committee on House Administration, entitled Strategic Technology Road Map for the Ten Year Vision of Technology in the House of Representatives, provided a suggested structure for Information Technology evaluation and decision making. The Chief Administrative Officer, the Clerk, and the Sergeant at Arms are asked to prepare a report by June 30, 2009 on their efforts or plans to develop House-wide data-sharing standards; implement standard legislative document formats; address the increasing resource challenges of Member offices; and identify disparate systems throughout the institution that prevent it from taking advantage of economies of scale. [page 2]

and

Public Access to Legislative Data. There is support for enhancing public access to legislative documents, bill status, summary information, and other legislative data through more direct methods such as bulk data downloads and other means of no-charge digital access to legislative databases. The Library of Congress, Congressional Research Service, and Government Printing Office and the appropriate entities of the House of Representatives are directed to prepare a report on the feasibility of providing advanced search capabilities. This report is to be provided to the Committees on Appropriations of the House and Senate within 120 days of the release of Legislative Information System 2.0. [page 11]

According to an article in Wired: “In our web 2.0 world, we can empower the public by providing them with raw data that they can remix and reuse in new and innovative ways,” says Honda, who is vice chairman of the Appropriations Subcommittee on the Legislative Branch. “With these tools, the public can collaborate on projects that can help legislators to create better policies to address the pressing challenges facing our nation.” There’s also a good article at Mother Jones and a nice post by Tim O’Reilly.

The concept of bulk data downloads hasn’t been missed by many parts of the government. The Census Bureau and the Federal Elections Commission, for instance, are fantastic at sharing with the public as much as they can. In the latter case it is electronic versions of campaign contribution filings, which is obviously very important for preventing corruption. But, there are significant gaps in other areas of the government where a little legislating is necessary. Here we’re talking about information on bills in congress going back around two decades, and the information going forward.

The Library of Congress has a database of this information but they don’t share it with the public. Sharing it would mean that creating sites like GovTrack — and the various other sites that use data from GovTrack including OpenCongress and MAPLight.org — would be a little easier, but also a little more accurate. Right now GovTrack goes through a roundabout process to reverse-engineer the same information we are seeking from this database. Basically, we already have the information by scraping it off of thomas.loc.gov — we’d just rather get it directly rather than the way it is assembled now. So because I go through so much trouble to reverse-engineer the data I want, not so many things will change in an obvious way on GovTrack — it’ll just be that my life will be a little easier and the information will be a little more complete and up to date. But, you can expect to see other sites spring up doing new and interesting things with the information — ways of visualizing the congressional process that we couldn’t yet imagine.

The Government Printing Office is mentioned because of how they make legislative documents like the text of bills available to the public. PDFs and text-only versions are made available for free already. No problem there. But they have other files that would be useful to sites like GovTrack which they sell at ridiculously high subscription prices. Those files would make comparisons of bill text easier to produce (although GovTrack already has this feature, again by essentially going about it the hard way). If you think about it from the perspective that some bills go through Congress so fast no one has time to read them through, being able to apply technology to the process is so important, like to detect changes in the text of bills between versions to make it easier for people to get through it. This is what GPO is preventing by selling some of its files, rather than providing them to the public for free (which it is essentially mandated to do for most documents — why they exempt certain documents is not known).

Now, it’s not that the Library doesn’t necessarily *want* to share its database. It’s just that sharing it wasn’t a part of their mandate from Congress and they don’t want to upset Congress by stepping out of their mandate. The omnibus bill is an indication from the House to the Library that this would be something supported by Congress. (My understanding is that the Library has been seeking permission from Congress to do some of these things, probably in response to a previous push for this, but the omnibus legislation has been in the works concurrently.)

2 Responses to “Bulk data downloads approved in the omnibus spending bill (success!)”

  1. [...] Joshua Tauberer’s Blog » Blog Archive » Bulk data downloads approved in the omnibus spending bil… (tags: elangdell open_government) [...]

  2. Good work! I am very impressed and very thankful for the inspiration and quality of your work. We are all very fortunate to have it.

    Thanks very much.