This morning DC’s legal code went online as open data. I’ve worked with government before on open data, but never have I worked with a government body that moved so deftly through the technical, policy, and legal issues as the DC Council’s Office of the General Counsel. So, before anything else, thanks to the general counsel V. David Zvenyach and his staff for their time and expertise on this.
The TL;DR version goes like this:
Tom MacWright wanted to build his own version of the DC Code website. The DC Council couldn’t share its electronic copy of the Code because it contained intellectual property owned by West. This became a little and very geeky controversey (spurred by Carl Malamud). But Zvenyach — the general counsel — recognized the value of making the law open and did it. He removed the West IP from their electronic copy of the Code (I helped), posted the file on the Council’s website, and even included a CC0 public domain dedication
The last bit all happened within a matter of days, and it was one of the easiest open data success stories I’ve been a part of. Tom recapped the events here and began hacking the code immediately. He held a hacakthon on April 14 which he wrote about here (and Eric Mill wrote about here).
Here’s the longer version:
This all began a few months ago when DC-based civic hacker Tom MacWright took an interest in making local law more accessible. Intending to import the DC Code into Waldo Jaquith’s State Decoded project, he ran into a small problem: he couldn’t get a complete copy of the law. Intellectual property issues prevented the DC Council from simply emailing over their copy of the Code.
Many states, like the District, contract out the codification and code-publishing work to a third-party like West (owned by the Canadian-owned Thomson Reuters) or Lexis (owned by the Amsterdam-based Reed Elsevier). DC had previously contracted out to West, and last year switched to Lexis. Neither likes to share. DC’s official website to read the Code — which has been run by West — is free to the public, but copying any part of the Code off of that website might violate West’s copyright or terms of service, or both. Sharing the law might have been illegal.
In the case here in DC, the DC Council had Word documents containing the Code, given to them by their contractor West, but the documents contained West’s logo. The DC Council could not share the documents with West’s logo intact. And it wasn’t easy to take those logos out (more on that later). Informally speaking, West owned the DC Code.
I had met Zvenyach, the general counsel, before. He is very technologically savvy and has been trying to modernize the office he took over only a few years ago. We had even talked about holding a hackathon to help him do it. (As a DC resident, I’m also interested in DC law.) But his office, like all of government, is bound by limited resources and much work to do. When Tom brought the issue onto Zvenyach’s radar, I don’t believe there was any point at which Zvenyach didn’t want to make the files available. It was, as far as I’ve observed, merely a matter of time and resources.
Tom asked Carl Malamud to get involved. Carl has been working on this issue in other states, like in Oregon, where the State of Oregon itself claimed copyright over their laws. Carl bought (for quite some money) a physical copy of the DC Code, digitized it, and mailed thumb drives in the shape of famous presidents containing the digitized code to various important people. This was a spin on a tactic that Carl began in the 1990s when he opened the SEC’s corporate filings data: get the data online, pressure the government to put the data online themselves, and then help the government take over that responsibility.
The media and bloggers caught on, beginning I think with Corey Doctorow on March 27, followed by DCist on March 28, The Washington Times on March 31, Steve Schultze on April 1, and Think Progress on April 3. The files themselves went up on April 4, so little more than a week from the first media blog post about it, and the decision to put the files up with a CC0 license was made in any case some days earlier. It really did not take much pressure at all. (Tom also wrote a post on Greater Greater Washington on March 19.)
Carl had noticed early on that the DC Council asserted copyright over the Code. Some of the media reports focused on that. As Zvenyach explained in The Washington Times article, the rationale was to protect DC from West, by making sure West could not claim copyright over the same Code, not to limit access to the law. Whether or not state codes can be copyrighted was mostly besides the point, and the focus on this issue turned out to be a red herring. It was resolved quickly with the choice of the Creative Commons CC0, a public domain dedication.
I went in to Zvenyach’s office on April 3 to help them take West’s logo out of the Word documents. There was one document per title of the Code, or about 50 documents, many in the 50-megabyte size range. The West logo was in the header, but the header was specified independently for each section of the code, so in reality there were thousands of logos to take out. We also took out a DC copyright line from the documents, which was also repeated in each section. It took about 4 hours for Microsoft Word to process all of the files, and 1 hour for us to figure out how to do it so “quickly.”
When I left Zvenyach’s office that evening, Zvenyach pointed out the presidential thumb drive still sitting on his desk that he received from Carl — unfortunately I forget if it was a little George Washington or a little Abraham Lincoln. I have a feeling that thumb drive will be around for a while.
Now, there is a bigger issue here. There’s no plan for updating the public files. DC’s contract with Lexis going forward doesn’t require Lexis to provide DC with an electronic copy of the code. Perhaps after this they’ll refuse to do so. But we’ll tackle this another time.