Archive for the ‘Civic Hacking’ Category
Posts related to my site GovTrack.us, which tracks the U.S. Congress, and related issues in the world of civics, technology, open government.
Friday, May 29th, 2009
Recently the GSA has been negotiating on behalf of federal agencies special Terms of Service agreements with various social media services like YouTube to allow agencies to make use of these services — some of those agreements are now publicly available. My understanding is that GSA’s negotiations were necessary before agencies could use these services because of legal issues like liability. I’ve reviewed the TOS’s to see whether they address open-government concerns.
The use of non-governmental services like these as part of a governmental function raises several openness issues, which we rehashed in an earlier thread on the use of YouTube by Congress. To summarize, the issues include:
- whether the service provider meets government web standards including accessibility, privacy, security, nondiscrimination, and archival access to media
- whether the service providers require members of the public to enter into a contractual agreement with them (i.e. more terms of service) in order to access government content, what the public must agree to, and whether these additional terms with the public restrict what the public can do with and how the public can share government media obtained through the service
- whether use of the service constitutes an endorsement of a particular brand or technology, or if it provides a significant business advantage to a profit-seeking entity
- whether the service provides government media in a data format that does not impose technical and legal restrictions on users of the media
I think I need to include a special note about privacy. We can expect that any non-governmental site is going to track their users’ behavior as best they can because of the financial incentives of user-targeted advertising and selling demographic data. I don’t know to what extent any of the services below make use of this, but I expect that this is a major component of the revenue of all of them.
The GSA TOS are amendments to the standard TOS employed by the services. I haven’t read through any of the standard TOS, so of course I might be missing something.
I reviewed the TOS’s with respect to these issues. They had common elements.
No Advertising: The service agrees to not place advertisements on pages with government content (i.e. government “channels” and the like). This addresses part of the concern of endorsement. Of course, services may continue to display their brand name and link users to other parts of the service, so they are still able to promote their business.
No Cookies: The service agrees to not set cookies when a widget is placed *in* a government agency webpage. This means that the service gives up its ability to do the most advanced user tracking in the cases where the user may be unaware that they are even accessing a non-governmental service. The service may still track accesses by IP address which still may provide a more rudimentary means to track users but is more likely to be anonymous.
Closed Captioning: The service will provide the ability for government media to include closed-captioning for videos using industry standard practices, which is of course important for accessibility.
The TOS are linked from here:
https://forum.webcontent.gov/Default.asp?page=TOS_agreements
Here are reviews of each TOS:
AddThis.com
AddThis.com provides a “Social Bookmark & Feed Button Builder”. It’s a widget developers can put on their websites to help users share content on other social sites like Facebook. Because of what AddThis does, there are only a few concerns to be addressed. The TOS addresses both main concerns:
- No Advertising — on government “channels” on the AddThis site
(doesn’t seem to actually apply to AddThis).
- No Cookies — when placed on .gov/.mil websites.
https://forum.webcontent.gov/resource/resmgr/terms_of_service_w_socmed/addthistos_4.30.09_final_uns.pdf
Blip.tv
Blip.tv is a video hosting website, like YouTube. All of the privacy concerns above are relevant. The TOS includes:
- No Cookies — Blip.tv will allow the government to disable parts of its embeddable player that sets cookies.
- Closed Captioning.
Other aspects require some elaboration:
Ads-
The GSA TOS has a confusing section on advertising:
“Blip.tv reserves the right to run advertisements on any page on Blip.tv, but will not run advertisements in-stream or directly adjacent to user videos without the opt-in of the user who uploaded the video.”
It sounds like Blip.tv can place ads on the page, just not directly adjacent to or inside of government media.
Privacy-
The GSA TOS say explicitly that Blip.tv does not collect personally identifiable information about users, but does collect and use demographic data for targeted advertising. Users should expect to be asked demographic data.
https://forum.webcontent.gov/resource/resmgr/Docs/Blip_tv_-_Terms_of_Use_Agree.doc
Blist.com
This is a “social data discovery” tool where users can upload tabular data sets to share. I am actually going to skip a review of this TOS because I expect (or sincerely hope) that tabular data sets are shared with the public directly (as a bulk data download) besides through social tools.
Facebook
Facebook is a social networking site.
The negotiated TOS are not available to the public. Given the likely pervasiveness of the use of all of these tools in the future, it would be a shame if the GSA is facilitating government agencies’ use of third party services that violate the public’s expectations for government web content.
Flickr
Flickr is a photo sharing website. All of the concerns listed above are relevant to Flickr. There are no provisions in the TOS relevant to this review.
https://forum.webcontent.gov/resource/resmgr/Docs/Flickr_TOS_Agreement_Amended.doc
MySpace
MySpace is a social networking site.
The negotiated TOS are not available to the public. Given the likely pervasiveness of the use of all of these tools in the future, it would be a shame if the GSA is facilitating government agencies’ use of third party services that violate the public’s expectations for government web content.
SlideShare
SlideShare is a document (i.e. presentations) sharing tool. The TOS are not posted on the GSA website, but this appears to be a publishing mistake as it notes that the TOS are intended to be publicly available.
Vimeo
Vimeo is a video sharing tool. The TOS has one relevant provision, despite all of the concerns being relevant.
- No Advertising — on government “channels” on the Vimeo site
https://forum.webcontent.gov/resource/resmgr/terms_of_service_w_socmed/vimeo_tos_final_april2009.doc
YouTube
YouTube is a video sharing site.
The negotiated TOS are not available to the public. Given the likely pervasiveness of the use of all of these tools in the future, it would be a shame if the GSA is facilitating government agencies’ use of third party services that violate the public’s expectations for government web content.
Conclusion
While I am encouraged by the GSA’s forward thinking to make use of the latest technologies developed in the private sector, I believe that working with the private sector poses a number of risks to government data, to the public’s privacy and free speech rights, and to good governance. These risks can be minimized and some useful provisions have been included in the negotiated TOS’s along these lines, but far more careful thinking is necessary.
While several of the TOS addressed accessibility and privacy concerns, none of the TOS addressed security, nondiscrimination, archival access to media, the TOS the public are required to enter into to access government content through these services, and web media data formats.
Update: See also this related post.
Posted in Civic Hacking | No Comments »
Tuesday, May 19th, 2009
I frequently see questions like how can I convince my government that open data is important?, and what should I do as a government web manager to make data open?. These and other questions came up at Transparency Camp a few months ago, and at the end of the conference Gunnar Hellekson of Red Hat, and later I, decided to take on the project of bringing together a repository of best-practices guides for technology’s role in an open government. We have a wiki page for the project which lists some of the guides we’d like to see written.
Since the conference I’ve been working on the first guide, Open Data is Civic Capital: Best Practices for “Open Government Data”, which you can read by following the link. The goal was 1) to motivate why open government data isn’t just an ideological issue but actually makes society more powerful, and can really make the world a better place, and 2) to outline some suggested priorities and recommendations for open government data, drawing on the recommendations of a number of past groups (e.g. the 8 Principles of Open Government Data, and others). Thanks for feedback to Gunnar, John Wonderlich, Carl Malamud, Joe Germuska, Kevin Lyons, and David Robinson. (They had a lot of great suggestions many of which I haven’t had the energy to follow through with yet.) The essay begins:
“Creating a well-informed public is a core value of representative government. It is a prerequisite for ensuring the best representatives are elected and a crucial component of government oversight—as well as being important in areas well beyond civics. This document speaks to why public government data (also called ‘public sector information’) is a valuable resource to society if put on the Web and shared freely with the public, and discusses how to go about doing it. We discuss technological considerations and end with sixteen guiding principles for best practices in open government data.”
Kevin Lyons, who works for the Nebraska State Legislature, began work on a best practices guide for the use of the PDF format. When is it appropriate, what to look out for. That’s up on the wiki and I’m sure your suggestions & revisions would be welcome.
Posted in Civic Hacking | No Comments »
Tuesday, May 5th, 2009
I couldn’t have said it better:
JohnWonderlich: woot! Go Senate XML! (votes data now posted, in policy reversal) http://bit.ly/Il7hF
I’ve been nagging about this for a while. Big thumbs up to the Senate webmaster for a quick turnaround time too.
Posted in Civic Hacking | No Comments »
Saturday, April 18th, 2009
The big news lately is that the Center for Responsive Politics opened up their large database of normalized campaign contribution records under a Creative Commons license. I think this is more significant to the world of government transparency & technology than it might appear. Just around five years ago this world was quite different. Organizations like CRP were very much using technology to bring new insight to civics. That hasn’t changed. But organizations saw themselves as solitary entities whose primary mission was to provide a new direct-to-citizen service to the public. A web application, for instance. There’s no need for me to list off other examples — every advocacy and government transparency website was like that, to the best of my recollection. (Except maybe IMSP who seemed to be ahead of the pack.)
All that has changed, and I wish I could pinpoint exactly how that happened. The combination of “Web 2.0″ as a buzz-word and grassroots digital campaigning in 2004 probably had a lot to do with it. The Howard Dean presidential campaign got a boost (at least in terms of publicity if not poll numbers) from developers coming together to specialize the Drupal open source CMS for political campaigning (“CivicSpace”). That sent a message, even if no one quite recognized it at the time, that developers have a role to play in the world of civics and that cooperation was a viable model for getting things done. Not to say that the CivicSpace project invented this — I was working on GovTrack for a few years by that point and across the pond Tom Steinberg and the MySociety group had been thinking about open source civics for even longer. But I suspect, even in my own thinking, that CivicSpace crystalized some vague earlier notions of civic hacking.
The story isn’t over yet, though, because I don’t think any of this alone would have brought us to where we are today. Unfortunately, from this point forward I run the risk of giving too much credit to the things I know about and not enough credit elsewhere. Still, here’s how I see it. Four more things had to happen, independently. First, entrepreneur Mike Klein had to make a lot a lot a lot of money. Second, Dan Newman and David Moore had to build MAPLight.org and OpenCongress.org, respectively. These are, now, and especially were at the start, leading examples of how you can do really cool new things by mixing data sources (for MAPLight, mixing my GovTrack legislation data with campaign contribution data from CRP) or re-mixing data sources (for OpenCongress giving my legislation data a more social spin). Third, John Wonderlich had to start, quite by accident, the Open House Project — this was a crucial step in bridging the technology world with staffers for congressmen, especially with Speaker Pelosi’s office. The fourth bit was that Ellen Miller and Micah Sifry had to put it all together and form the Sunlight Foundation: funding from Mike going to two great technology projects (IMO these are Sunlight’s most important grantees) and a policy arm with teeth because of its pragmatic approach to connecting with policymakers.
That’s pretty much it, because from there things just make sense. Sunlight recruited great staff and steamrolled through the open government world stamping out the idea that each open government group should be in its own little world — by funding interaction, in a sense.
The expectations for government transparency advocacy changed. Groups had to walk the walk a bit more by sharing and collaborating. So now besides CRP’s data being opened up for anyone to remix we have the Taxpayers earmark data, the Sunlight Labs API, the MAPLight API, and probably several more databases. The New York Times API probably owes some of its inspiration to these changing expectations too. So it’s a whole new world now of not just open governenment, and not even open government data, but open government transparency advocacy data. (Is there a catchier name for that?)
Posted in Civic Hacking | 5 Comments »
Friday, April 10th, 2009
Watch a video of my talk at the Free Culture Conference last year on Civic Hacking. (Text and slides here.) It was my best talk yet. I’ve got another good one (if I do say so myself) coming up at CITP’s Studying Society in a Digital World conference in a few weeks at Princeton.
Posted in Civic Hacking | No Comments »
Wednesday, March 18th, 2009
Announcing: HackingCongress.org
The intersection of civics & technology
http://www.hackingcongress.org/
Our community is growing rapidly these days. And while TransparencyCamp gave us a physical place to come together a few weeks ago, we’re still a little nomadic in the online world.
We’re also a very diverse group. The fact that we all often have to cross-post to the same set of lists indicates that we’ve got great inner communities that focus separately on coding, policy, social media, etc.
HackingCongress.org is meant to be a neutral-ground home for the coder community in the open government world. Really, it’s just a links page. But it’s a links page with a nice Drupal theme you can proudly point to and say “this is my movement”. “Hacking” is, of course, a word with several meanings. In the programming world it is very much a positive term meaning something like “creative programming”.
The site is running Drupal and anyone that creates an account can edit any content on the site. So, it’s basically a new wiki.
Right now you can find:
Community
———
Links to the primary convergence locations for this community, the Sunlight Labs and PoliParsers mail lists, the IRC channel #transparency, the Planet oGosh blog aggregator, the oGosh Facebook group, and the Upcoming Transparency Events page on the OpenCongress wiki.
Links to all of the other mail lists for our community (all of the one’s I’ve mailed here and some others).
Data & APIs
———–
Then beginnings of a list of the databases and APIs that are available for government transparency data. If you’re a data source, add yourself to the list or make sure I got your entry correct, please.
Projects
——–
Links to ongoing projects broken down by type-
– Open-source coding projects like OpenCongress and Sunlight’s
Fifty States.
– Policy projects like Open House/Senate.
– Wiki projects like the new Wired gov data wiki.
It was everything I could come up with quickly. I’ll be adding more as I see them, but feel free to add your own project.
#transparency
————-
Using a Drupal module you can enter the community’s IRC channel #transparency through the website.
Blog Aggregator
—————
Recently I announced Planet oGosh, an aggregator bringing together a whole bunch of blogs in the open government tech community. I’m changing the URL to planet.hackingcongress.org.
Final Notes
———–
Thanks to Kendall Clark for donating the domain name.
Posted in Civic Hacking | No Comments »
Thursday, March 5th, 2009
There’s a wacky thread on the Open House List of poems for transparency, so I gave it a try. (In fairness, I should say I used an electronic pronunciation dictionary to find some of the rhymes.)
There once was a man named Mike Honda,
A congressman us geeks are quite fond ‘a,
In markup sessions takes on the chairman, a hulk,
so that we the people can get our data in bulk.
His friend maverick Joe likes transparency too,
Senate votes in XML he says long overdue,
At party politics he snorts,
Because the public should see those CRS reports.
And last we hear of the executive’s new plan,
For a CTO and CIO…
…perhaps YesWeScan?
Posted in Civic Hacking | No Comments »
Monday, March 2nd, 2009
Yesterday I held a session called Semantic Web II: Civic Hacking, the Semantic Web, and Visualization at Transparency Camp. In addition to posting my slides, here’s basically what I said during the talk (or, now on reflection, what I should have said):
Who I Am: I run the site GovTrack.us which collects information on the status of bills in the U.S. Congress. I don’t make use of the semantic web to run the site, but as an experiment I generate a large semantic web database out of the data I collect, and some additional related data that I find interesting.
Data Isolation: What the semantic web addresses is data isolation. For instance, the website MAPLight.org, which looks for correlations between campaign contributions to Members of Congress and how they voted on legislation, is essentially something that is too expensive to do for its own sake. Campaign data from the Federal Election Commission isn’t tied to roll call vote data from the House and Senate. It’s only because separate projects have, for independent reasons, massaged the existing data and made it more easily mashable that MAPLight is possible (that’s my site GovTrack and the site opensecrets.org). The semantic web wants to make this process cheaper by addressing mashability at the core. This is important for civic (i.e. political/government) data: machines help us sort, search, and transform information so we can learn something, which is good for civic education, journalism (government oversight), and research (health and economy). And it’s important for the data to be mashable by the public because uses of the data go beyond the resources, mission, and mandate of government agencies.
Beyond Metadata: We can think of the semantic web as going beyond metadata if we think of metadata as tabular, isolated data sets. The semantic web helps us encode non-tabular, non-hierarchical data. It lets us make a web of knowledge about the real world, connecting entities like bills in congress with members of congress, what districts they represent, etc. We establish relations like sponsorship, represents, voted.
Why I care: Machine processing of knowledge combined with machine processing of language is going to radically and fundamentally transform the way we learn, communicate, and live. But this is far off still. (This explains why I study linguistics…)
Then there are some slides on URIs and RDF.
My Cloud: When the data gets too big, it’s hard to remember the exact relations between the entities represented in the data set, so I start to think of my semantic web data as several clouds. One cloud is the data I generate from GovTrack, which is 13 million triples about legislation and politicians. Another cloud is data I generate about campaign contributions: 18 million triples. A third data set is census data: 1 billion triples. I’ve related the clouds together so we can take interesting slices through it and ask questions: how did politicians vote on bills, what are the census statistics of the districts represented by congressmen, are votes correlated with campaign contributions aggregted by zipcode, are campaign contributions by zipcode correlated with census statistics for the zipcode (ZCTA), etc. Once the semantic web framework is in place, the marginal cost of asking a new question is much lower. We don’t need to go through the work that MAPLight did each time we want a new correlation.
Linked Open Data (LOD): I showed my part of the greater LOD cloud/community.
Implementation: A website ties itself to the LOD or semantic web world by including <link/> elements to RDF URIs for the primary topic of a page. This URI can be plugged into a web browser to retrieve RDF about that resource: it’s self-describing. I showed excerpts from a URI for a bill in congress that I created. It has basic metadata, but goes beyond metadata. The pages are auto-generated from a SPARQL DESCRIBE query as I explained in my Census case study on my site rdfabout.com.
SPARQL: The query language, the SQL, for the semantic web. It is similar to SQL in metaphors and keywords like SELECT, FROM, and WHERE. It differs in every other way. Interestingly, there is a cultural difference: SPARQL servers (“endpoints”) are often made publicly acessible directly, whereas SQL servers are usually private. This might be because SPARQL is read-only.
Example 1: Did a state’s median income predict the votes of Senators on H.R. 1424, the October 2008 stimulus bill? I show the partial RDF graph related to this question and how the graph relates to the SPARQL query. First it is an example SPARQL query. Then the real one. The real one is complicated not because RDF or SPARQL are complicated, but because the data model *I* chose to represent the information is complicated. That is, my data set is very detailed and precise, and it takes a precise query to access it properly. I showed how this data might be plugged into Many Eyes to visualize it.
My visualization dream: Visualization tools like Swivel (ehm: I had real problems getting it to work), Many Eyes, Ggobi, and mapping tools should go from SPARQL query to visualization in one step.
Example 2: Show me the campaign contributions to Rep. Steve Israel (NY-2) by zipcode on a map. I showed the actual SPARQL query I issue on my SPARQL server and a map that I want to generate. In fact, I made a prototype of a form where I can submit any arbitrary SPARQL query and it creates an interactive map showing the information.
Other notes: My SPARQL server uses my own .NET/C# RDF library. That creates a “triple store”, the equivalent of a RDBMS for the semantic web. Underlyingly, though, it stores the triples in a MySQL database with a table whose columns are “subject, predicate, object”, i.e. a table of triples. See also: D2R server for getting existing data online.
Posted in Civic Hacking, Code, Mono, Semantic Web | 2 Comments »
Monday, February 23rd, 2009
Continuing from my last post on this subject, I found some more examples of influential data sets from a page on FlowingData.com. I’m expanding beyond government data in this post.
“Baseball Statistics: In 2003, Michael M. Lewis’ book, Moneyball: The Art of Winning an Unfair Game, was released. As a result, the way baseball teams were built changed completely. Before Moneyball, teams relied on insider information and the choice of players was highly subjective. However, in 2002, a year before the book was published, the Oakland A’s had $41 million in salary and had to figure out how to compete against teams like the New York Yankees and the Boston Red Sox who spent over $100 million in salaries.”
“Megan’s Law: Since 1994, those who have been convicted of sex crimes against children have been required to register with local law enforcement. That data is made public so that people know about sex offenders in their area. Mash that data with Google Maps. Lo and behold, parents became instantly aware of caution areas and some might never look at their neighbor the same way ever again, while sex offenders start declaring themselves homeless.”
Posted in Civic Hacking, Semantic Web | No Comments »
Tuesday, February 10th, 2009
One of the concrete benefits of open government data is that third parties can use the data to do something useful that no one in government has the mandate, resources, or insight to do. If you think what I am about to tell you below is cool, and helpful, then you are a supporter of open government data.
On my site GovTrack, you can now find comparisons of the text of H.R. 1, the stimulus bill, at different stages in its legislative life — including the House version (as passed) and the current Senate version (amendment 570).
The main page on GovTrack for HR 1 is: here
Here’s a direct link to the comparison:
Comparisons are possible between any two versions of the bill posted by GPO. Comparisons are available for any bill.
If you find this useful, please take a moment to consider that something like this is possible only when Congress takes data openness seriously. When GPO went online and THOMAS was created in the early 90s, they chose good data formats and access policies (mostly). But the work on open government data didn’t end 15 years ago. As “what’s hot” shifts to video and Twitter, the choices made today are going to impact whether or not these sources of data empower us in the future, whether or not we miss exciting opportunities such as having tools like the one above.
(Thanks to John Wonderlich and Peggy Garvin for some side discussion about this before my post. GovTrack wasn’t initially picking up the latest Senate versions because GPO seems to have gone out of its way to accommodate posting the latest versions before they were passed by the Senate, which is great, but caught GovTrack by surprise.)
Posted in Civic Hacking, Open House/Senate Projects | No Comments »
|