Archive for the ‘Open House/Senate Projects’ Category

These posts are archives of my posts on The Open House Project and The Open Senate Project.

Government Data and the Invisible Hand

Friday, June 6th, 2008

The guys over at Princeton’s new Center for Information Technology Policy wrote a really great paper for the Yale Journal of Law & Technology on the role data should have, compared to websites, in government. It articulates a point that I think many of us subconsciously have had in mind:

“The new administration should specify that the federal government’s primary objective as an online publisher is to provide data that is easy for others to reuse, rather than to help citizens use the data in one particular way or another.”

And they suggest an interesting way to push that forward:

“The policy route to realizing this principle is to require that federal government websites retrieve the underlying data using the same infrastructure that they have made available to the public. Such a rule incentivizes government bodies to keep this infrastructure in good working order, and ensures that private parties will have no less an opportunity to use public data than the government itself does. The rule prevents the situation, sadly typical of government websites today, in which governmental interest in presenting data in a particular fashion distracts from, and thereby impedes, the provision of data to users for their own purposes.”

I think this is a worthwhile addition to the opengovdata and publicmarkup.org policy documents — if not as a direct recommendation (because I think it may be too much to ask for in a grand form) then noted as a long-term goal or (in terms of the second paragraph I quoted) as a benchmark, a concrete way to tell whether data is open.

The full citation is: Robinson, David, Yu, Harlan, Zeller, William P and Felten, Edward W, “Government Data and the Invisible Hand” (2008). Yale Journal of Law & Technology, Vol. 11, 2008

It’s primary election day, and I have no idea what’s going on

Tuesday, April 22nd, 2008

I’ll put the moral of this story up front: data is nice, but the problem is the media.

Today is the primary election in Pennsylvania, and I intend to go over to vote in a few minutes. But it occurred to me only this morning that more things may be on the ballot than the presidential nominees. To be a good citizen, I realized I had better read up on what else I will be voting on before strolling across the street to cast my ballot.

So what else is on the ballot? I figured the PA state website might have that information. Hah. I should only be so lucky. Googling I found CNN’s page only has presidential information; I found an AP article that mentions other candidates; and eventually some five pages of Google hits later I find that the League of Women Voters of Pennsylvania seem to be the only group that has put this information online. Thanks to the LWV!

Fortunately some of the ballots are easy. The primary for the House and State Senate are uncontested in each party, and it looks like only one party even has a candidate for State House. (Lucky us.) I’m supposed to vote for state attorney general, auditor general, and treasurer. Each party has one candidate for the first two— another easy “choice.” (I presume these are all primaries and not actual elections for these offices, but the page doesn’t say.)

Then as I look down I find myself completely baffled. I’m supposed to elect pledged delegates to my party’s national convention?? Exactly what am I doing when I vote for either Clinton or Obama if I also have to choose delegates? (Ok, I just read an article in the Philly Inquirer explaining it.)

So I tell you, it’s just ridiculous that I don’t already know this information, and that there is no comprehensive explanation of what is going on on the ballot (although the LWV come close).

Public policy with input from the public

Sunday, April 13th, 2008

Last night I attended an Obama campaign tech-policy panel discussion here at Penn. Unfortunately the consensus among me and my CITP friends who attended was that the event was almost completely uninformative on tech issues. One thing I did learn was that the Obama campaign is making use of some 1500 experts in the public to draft policy. That’s a refreshing idea. In the OHP report, I wrote the last chapter urging Congress to bring the public into their decision making process about using technology for transparency. In January, at dinner during the CITP’s cloud workshop, John Wonderlich and I and others were talking and a (to them crazy, to me interesting) idea came out about creating shadow congressional committees filled with experts/delegates from the public, rather than politicians.

Apparently this failure to reach out to experts and the public extends to the executive branch as well. Says an anonymous (but knowledgeable) writer, “it is widely-accepted that the federal government’s attempt to use the internet for regulations commenting (Regulations.gov) has been a failure.”

This seems to me to underlie some of the larger problems we face. The distrust of politicians created by, for instance, pay-for-access is about the fact that politicians aren’t turning to the right experts for advice. When Ted Stevens is mocked for his understanding of the Internet (though I say that a series of tubes is remarkably accurate), it’s because he clearly failed to talk to an expert.

There’s a structural problem here. Why don’t decision makers seek public input? What can we do in the public to make talking to us more appealing?

Transparency on a platter

Wednesday, April 2nd, 2008

Could it be any easier for Congress to enact some pretty ideal transparency legislation now? Last year I lamented on how the ethics reform bill (that Obama now touts as one of his best achievements) was a laundry list of updates to existing rules with only a few actual nuggets of real transparency reform. Well, if they want real reform, Sunlight is serving it to them on a platter at publicmarkup.org, as John noted last post. Modulo a few small modifications that have been suggested, there is really no reason anyone should oppose the bill. (IIRC and IMO, the most controversial section is about making CRS reports publicly available. I personally don’t feel so strongly about that section and can see why some would oppose it.)

While we were drafting the Open House Project report a year ago, it seemed like a good next step would be to just write the legislation that would achieve what we wanted changed — and see if we could get it introduced like other advocacy and industry groups seem to be able to do. I’m glad Sunlight took the time to formulate those recommendations (and more) into their proposed bill.

One of the most interesting sections in Sunlight’s bill is the incorporation of the 8 Principles of Open Government Data (www.opengovdata.org) drafted at a conference in December. Rather than including in each provision or specifying explicitly how data should be made available (although some provisions are explicit, inconsistently), the bill would require the GAO to annually assess the implementations of each provision of the bill according to the principles (section 901). It’s an interesting choice to put the principles in an assessment rather than a requirement (and thinking back, I can remember this type of idea coming up in some of John’s posts). An assessment can keep up with the times, and can place ongoing pressure after initial compliance has been reached. It does seem to allow new data to avoid the principles, though. Could the principles be both mandated and audited? Are the principles specific enough to mandate? (Could we make them specific enough?)

More money and votes: Now I know how to explain the problem

Thursday, March 27th, 2008

Let me give you two headlines, and you can tell me your reaction to each:

A) Big Oil Finances the Republican Party
B) Congressional Votes Correlated with Big Oil Contributions

From the headlines you’d think the articles are about two separate facts about the world. That is, that the two facts are independent. One can read the first article and then still be surprised after reading through the second article. A friend of mine says about the second hypothetical headline, “I think the votes are sort of taking it to the next step of association.” Compare those with these headlines:

A) Factory Emissions Tied to Deadly Cancer
B) Life Expectancy Lower in Factory Towns

Here you’d say the articles are about the same thing: clearly, because more deadly cancer means more deaths, and that means lower life expectancy. If everyone already knew (A) from an expose last year, when a newspaper reports (B) we say what an idiot the reporter must have been — he just reported the same thing again.

That’s what’s happening with money-and-votes analyses. The facts are more complicated, and the results are less clear, so it’s easy to overlook the problem here. But the problem is here.

We all already know that Big Oil gives more to Republicans than to Democrats. If you didn’t know, you know now. It’s an interesting point; no qualms there. Since the Republicans support big business and the Democrats support the environment (whatever, you get the idea), it makes sense. Fine.

But because of that, I know immediately that any vote related to Big Oil is likely to go down with an uneven distribution of money coming from Big Oil between Yes votes and No votes. In fact, it has nothing at all to do with Republicans and Democrats having different views on oil in particular. It’s just that Republicans and Democrats either all vote together (on naming post offices) or vote against each other (on everything else). The votes where everyone agrees are not relevant here: you can’t have an uneven distribution of money between Yes and No votes when there aren’t No votes to begin with. Since the relevant votes are almost always split on party lines, of course there is going to be a correlation between money and votes.

What I mean by of course is that no one should be surprised to learn about a big correlation between money and votes if it has already been established that there is a correlation with contributions to a particular party. Finding out the magnitude, in dollars, of the correlation doesn’t change anything. It might as well be reported as “Big Oil Gives $XXX more to Republicans than to Democrats.” This headline is less exciting, but it’s the same thing. Throwing in votes just makes it sound more important, and it is misleading because it makes it sound like there is something new and nonobvious to be learned.

Let’s look at what is being reported. I paraphrase from Follow the Oil Money (sorry guys):

Of the 25 Representatives who took the most Big Oil money per term between 2000 and 2007 Representatives, 23 were Republicans. Of the 25 Representatives who took the least amount of Big Oil money per term between 2000 and 2007, 22 were Democrats. … Representatives who voted against clean energy proposals took more than 4.5 times more oil money than those who voted in the public interest.

Why not just say “Republicans” took more than 4.5 times more oil money than “Democrats”? (Well, the number may change a bit of course, but that’s the idea.) The votes have nothing to do with it unless it is showed that the votes were something other than decided roughly on party lines. It’s another question entirely whether the money influenced the votes, or whether votes influence future contributions — a question that is unsolved.

I raised a similar issue previously with some numbers from MAPLight. To paraphrase their analysis and interjecting my own totals:

Opponents of H.R. 1424 gave an average of $22,479 to Republicans and $12,646 to Democrats. These industry groups gave an average of $22,693 to legislators who voted No on this bill, compared to $14,183 to legislators who voted Yes.

Can you guess what happened? It’s no accident that $22,479 is close to $22,693 and $12,646 is close to $14,183. The Republicans predominantly voted No and the Democrats predominantly voted Yes. Actually the vote wasn’t exactly evenly split, which makes the results more interesting. You can still see an effect of money on the vote beyond the party difference, as I noted in that post, but it’s a much smaller correlation.

What do you want to do with word frequencies?

Monday, March 10th, 2008

John Wonderlich wrote:

After defining (and normalizing) the likelihood that words appear in text, you could start making comparisons between bodies of work, and creating interesting tag-cloudish visualizations of what distinguishes some text you’d like to analyze. You could build a widget for your blog that says “the following are the words that are more than 25% more likely to be used on this blog than they are to be used in New York Times cover stories”, or, “here are recent news stories that also have similarly unlikely words used.”

I don’t know how people usually do cloud visualizations, but if I were
making a word cloud, that’s *precisely* what I would do — i.e. this is
probably how people do it.

See:
http://en.wikipedia.org/wiki/TFIDF

http://en.wikipedia.org/wiki/Latent_Semantic_Indexing

Now, the thing is that word counts actually don’t get you very much information. Remember back to the days before Google- search engines gave you back documents by matching words and returning documents where you search terms appeared most frequently. Then Google came along and ranked documents differently and we all saw how *awful* word frequency was for determining relevance to a query.

So the question is what you would use word counts *for*. Clouds are nice, but look for cases where words aren’t exactly the appropriate level of chunking to identify relevance. (And, you will see this in most word clouds.) Articles back in 2004 about the Democratic ticket might have used the word “John” an exceptional amount owing to the dynamic duo’s shared first name, but “John” in a word cloud isn’t very informative. You’d want to chunk whole names together, but that’s a difficult problem in itself.

Note also for comparing documents that the frequency of a word isn’t very indicative of a word’s prominence in a text, and if you have a profile (i.e. vector) of word frequencies for two documents, it’s not immediately obvious how you would compare profiles to arrive at whatever result you want. (Not to say there aren’t ways to do it, but that there are many ways to do it.)

Money is not quite so big of an incentive for voting with your wallet

Friday, March 7th, 2008

I like to be devil’s advocate among my friends, and since MAPLight and Sunlight are some of my friends, they can’t get out of a careful look over their analyses. Ellen writes on her blog about an analysis provided by MAPLight of the correlation of contributions to representatives and their vote on H.R. 1424 (bill | vote | MAPLight page):

They found that those “interested” in the legislation, both pro and con, gave over $8,000 more to the individual legislators who voted the way they wanted them to. A press release from Maplight.org gives more detail:

Opponents–such as Accident and Health Insurance, Big Business, Chambers of Commerce, Restaurant and Manufacturing, Retail and Wholesale Trade gave an average of $22,693 to legislators who voted No on this bill, compared to $14,183 to legislators who voted Yes. The disparity is 160% [JT- that’s 60%!] more money given to a No vote.

Supporters–such as Health and Welfare, Mental Health care-givers, Mental Health Services, Clergy and Non-profit–gave an average of $4,242 to legislators who voted Yes on this bill, compared to $1,812 to legislators who voted No. The disparity is 234% more money [JT- should be 134%] given to a Yes vote, or $2,430.

…. Dan Newman, MAPLight.org’s director, … points out that campaign contributions are just one factor in determining how a legislator votes, and they do not claim one caused the other. “We do make the claim, however, that campaign contributions bias our legislative system,” he adds. “Simply put, candidates who take positions contrary to industry interests are unlikely to receive industry funds and thus have fewer resources for their election campaigns than those who vote in favor.”

I don’t suggest the numbers reported are wrong (well, actually, the percent changes are wrong), but the relevant disparity in money, as far is it could be tempting motivation for a legislator to change his position, is much smaller than MAPLight reports.

The trouble with MAPLight’s analysis of the correlation, even putting causality aside, is that contributions are correlated with party membership, and so are votes. So it’s no surprise there is a correlation between money and votes. If I give only to Democrats and equally to all Democrats, it will appear as if I’m giving money only to those voting in favor of Democratic issues — even though my contributions have not taken into account any particular issue position. Further, and importantly, even though you will see this correlation between my money and votes, it does not mean there is any incentive for a Democrat to change his position on an issue. That’s because in my hypothetical I am giving equally to all Democrats. The only incentive is for a Republican to become a Democrat to get some of my money, but that rarely happens. Bottom line: correlation doesn’t immediately establish incentive.

Returning to H.R. 1424, what we need to do is split the Members by party. The incentive for a Democrat can only be established by looking at the money going to Democrats. In this case, only three Democrats voted No on the bill, and three Democrats is not a large enough sample to come to any conclusions about anything (t-test be damned).

As for the Republicans, industry groups opposing the bill gave an average of $22,850 to Republicans voting against and $19,525 to those voting in favor (leaving out a clear outlier, in MAPLight’s favor). Yes, more money went to those voting against, but only $3,325. That’s a 17% difference, not a 60% difference. (It’s also a relatively small amount compared to the variability in the contributions just within the yes or no vote groups separately.)

That just leaves the contributions to Republicans from industry groups supporting the bill. Here MAPLight’s point stands. An average of $3,630 went to those voting yes and only $1,865 to those voting No. That’s a big difference, around $1,765, but still smaller than what MAPLight reported.

So here’s the bottom line: The incentives for Members of Congress to vote according to their war chest is far smaller than what is evident from MAPLight’s analysis because representatives are not competing for money going to the other party. By looking at Republicans alone, we see that it is true that money from groups supporting the bill went more to those voting in favor of the bill, but with a difference of only $1,765 (nevertheless, nearly a 100% increase over the no-vote amount). However, while there was a lot more money at play from groups against the bill, the difference between the yes voters and no voters was $3,325 (a 17% increase over the smaller of the figures), a much smaller incentive than the $8,500 reported by MAPLight.

Party Transparency: Isn’t there an elefant in this room?

Friday, March 7th, 2008

A shiver, well at least a small one, goes down my spine every time I see transparency and claims about fairness mixed in with party politics. There are two big issues running around, the first being superdelegates, the back-room deals, and uncertainty over the fairness of a confusing multi-level delegate-based system to choose party candidates. What bothers me here is that registered Democrats choose to be registered Democrats. Unlike in government transparency where if you live here not only do you not choose to be subject to U.S. law but you also have no other alternative governments to choose from, in politics you are free to choose any party or start your own.

I’m not so heartless to not think that it’s unfortunate that the decision-making process to choose the national candidates is as opaque as it is, but why isn’t anyone talking about why people actually aren’t free to choose alternative parties? That’s the elephant that ought to be in this room. In commerce, when things are unfair for a lack of options we cry monopoly and get things rectified by the FTC. In politics, why isn’t anyone complaining of the same?

The second issue is the so-portrayed disenfranchisement of Michigan and Florida Democratic voters on account of their states flaunting the national committee’s directive over primary dates. Do we penalize the voters there for the actions of their state party leaders? I don’t see how the voters are being penalized. The voters elected their party leaders to make the decision over the primary dates: It’s too bad their elected leaders did something stupid once in office (as elected officials often do, right?). Clearly the public acquiesced to the decision in any case. What’s the recourse? Besides switching parties, citizens can vote to fire the elected officials when the next election comes around.

But where’s the elephant? It’s difficult to fire party leaders when they control the candidate selection process. Do I vote Republican in the next general election, going against my core beliefs, because the incumbent Democrat goofed on a non-governmental issue? Probably not. There obviously won’t be a serious Democratic challenger either, and certainly not one who is going to use this as a campaign issue if he wants any support from his party.

For good reason there are few legal restrictions on how parties operate internally — after all, free and fair elections means freedom from government oversight. But without rules imposed from above, there needs to be freedom of choice. That’s the real issue here, not transparency and accountability.

Congressman Honda on the Open House cause

Friday, February 1st, 2008

Congressman Mike Honda (D, CA-15) is one of this project’s heroes in the House. In fact, I can’t recall any other congressman picking out a recommendation of the Open House Project and saying publicly that it’s a good idea, and referencing this project. In November, he took real action to further transparency in Congress by supporting the Committee on House Administration in asking the Library of Congress to look into making the legislative database behind THOMAS publicly available to other websites to reuse. (This is of course the recommendation that I most care about.)

In an article last week in the National Journal’s Technology Daily, Honda compared the benefits of an open legislative database to what the world has gotten out of wikis and open source software like Linux.

Then on Wednesday, Honda blogged (a perfect medium to express the sentiment) about the issue, citing the Open House project’s report. He captures the issue really well:

I have been working on an initiative to make Congressional legislative information more accessible to the public. I believe that public information should be provided in a format that takes advantage of the innovative technologies that are revolutionizing the Internet, sometimes known as Web 2.0.

Making Congress’ legislative database open to the public “would enable independent Web sites to use information in new and creative ways, including educating the public about Congress and providing citizens with customized views of its proceedings,� according to a report from the Open House Project, an organization supporting this proposal.

Offering legislative information in a way that other websites can reuse could lead to revolutionary changes in the way our government functions, eventually allowing Congress to better tap into the knowledge and wisdom of the American people.

Exclusion from Presidential Debates: Kucinich gets injunction (for a short while)

Tuesday, January 15th, 2008

The Time’s The Caucus blog reports that Kucinich got an injunction against MSNBC excluding him from their debate airing now (which was a change from their initial position of including him), which was subsequently (of course) protested by MSNBC. I don’t know where things stand now except that the debate is happening now without Kucinich.

I don’t personally have an opinion about who should be included in debates at this point at this point (that is, at this point I don’t have a position about debates that occur at this point and forward), though I think it’s an important public policy question that doesn’t necessary deserve to be decided by corporations (owing to their use of public airwaves).

In the injunction request, Kucinich’s lawyers claim:

[This] undermines the purpose of the Federal Communications Act . . . and is a blatant violation of the Act because of the media’s obligation to operate in the public interest. . . . [It] is effectively an endorsement of the candidates selected by NBC. In addition, if NBC is given the liberty to designate every appearance of with two candidates as “news”, then no third candidate will have the ability to enforce the equal time requirement, which is inconsistent with the intent of Congress in enacting [whatever].

Kucinich also alleges breach of contract, but that’s less interesting.

The court’s initial ruling in favor of Kucinich agreed with Kucinich on both points, but did not provide any elaboration on why.

The injunction request included as an appendix in MBNBC’s appeal, available for download from the Times. The PDF includes the injunction request document twice: the second time, it is not cut off. The PDF also has some other interesting things: a photocopy of a check, the email addresses of campaign managers in some exhibits, and…

Among the materials included with the injunction request (I think- it’s hard to tell from my cursory reading what materials go with which documents) are emails from NBC executives to the candidates about their invitation to the debate, and, more interesting to me, to a telephone conference call about debate format. I wish someone would share or leak a recording of that conference call. That’s what I really want to see, and if you don’t know why….

Here’s a recap of where this post is coming from: I blogged previously on how I think there is an important story of corruption in how presidential candidates are included in televised debates, in that the big media corps do exercise control merely by limiting the playing field of candidates: the fewer candidates there are, the fewer they have to be in the pockets of. Not that I think every registered candidate in any state and his mother needs to be in every debate, but what I do believe strongly is that for those candidates that are included at all, they should get equal time to answer questions. Out of the last 8 debates in 2007 before the primaries (for both parties), MSNBC’s two Democratic debate most egregiously allocated time unevenly to the candidates, with the more popular candidates according to the polls getting much, much more time than the rest. The De Moines Register, on the other hand, ought to be applauded loudly for holding the only two debates in which time was allocated completely evenly.