Archive for the ‘Open House/Senate Projects’ Category
These posts are archives of my posts on The Open House Project and The Open Senate Project.
Wednesday, January 9th, 2008
I always find it interesting how although our government is run by fairly strict procedural rules that have been written out in various places, starting with the constitution and ending somewhere past the horizon, sometimes it’s just impossible to locate exactly at what point in the procedural game “reality” is. For instance, the constitution outlines how a bill can become a law. But, at what point is a bill considered vetoed? If the president is signing the veto signature but misspells “veto” (or whatever he writes in this case, I have no idea), or is taken to the hospital before he writes the “o”, is the bill vetoed, or is it still awaiting a signature?
The reason this is interesting to me is that we like to capture reality in data. The Library of Congress and GovTrack both systematize (or in computer jargon “normalize”) the bill-becomes-a-law process. At every point in the game, a bill, in our data formats, is either in-progress, enacted, dead, etc. It must be in one of these states. After all, the constitution outlines exactly what states a bill can be in, so any bill *must* be in one of these states.
But if we’re not sure what state a bill is in, what state do we put it in in our data? There’s also the more important question- What do the lawmakers do if they disagree about what state a bill is in? (Actually, I would prefer to phrase it as “what state they are in”, but that’s another story.) Wikipedia describes (what the editors of the page claim is) a current debacle over H.R. 1585: National Defense Authorization Act FY 2008:
In December of 2007, President George W. Bush pushed the pocket veto into murky waters by claiming that he had pocket vetoed H.R. 1585, the “National Defense Authorization Act for Fiscal Year 2008,” even though the House of Representatives had designated agents to receive presidential messages before adjourning. The bill had been previously passed by veto-proof majorities in both the House and the Senate [JT: and thus a traditional veto would have been futile].
So was the bill (pocket) vetoed or not? Is the bill still in-progress? Assuming it was not pocket vetoed, after 10 legislative days without a traditional veto it becomes law, and us citizens would hate to be on that 11th day without either resolution on the pocket veto matter or a traditional veto, because then we as a country will not know whether this bill has become law. (Another question: How might the Supreme Court assert jurisdiction over this question.)
But back to the data. At one point, some time after Dec. 28, someone in the House responsible for updating the bill status information shown on THOMAS entered a new status line:
Dec 28, 2007: Pocket Vetoed by President.
GovTrack picked up on the change and shows that status currently, much to the confusion of several people emailing me about it. Looking back at THOMAS, it seems like someone realized that that was apparently quite a constitutional (if not political) claim and retracted that update, because it not longer says that.
In many cases citizens complain when the government takes things back, hiding information previously made public. That’s definitely not what I am getting at here. THOMAS is forced to show *something*, and when it doubt… well, what can you do but roll back history until we figure out what the next legislative step actually *was*.
Posted in Open House/Senate Projects | No Comments »
Friday, December 14th, 2007
A little out of the scope of this blog, but I wrote previously about how the previous two democratic presidential debates were proportioning out speaking time to the candidates based roughly (if not entirely) on their poll numbers. In the 10/30 MSNBC debate, the correlation between speaking time and poll numbers was near perfect (a, b), with the leading candidate holding the floor more than 3.5 times as long as one of the trailing candidates. The proportioning of time was clearly planned, and I say this is a bad thing because viewers have a right to know that the TV network is deliberately skewing our view of the election by putting some candidates in our face more than others. The 11/15 CNN debate had still a very high correlation between speaking time and poll numbers, though not as high as the first debate, but nevertheless one of the leading candidates held the floor three times longer than one of the trailing candidates (c).
The Des Moines Register held the final debate last night, and I am happy to see that someone decided the debates would be done responsibly. The candidates all held the floor for roughly an equal amount of time (as per usual, according to the New York Time’s debate analyzer widget). Bill Richardson held the floor the longest — not a leader in the polls by any means — and only 1.4 times longer than the least-speaking candidate (versus 3.5 and 3 times above). (Comparing the speaking times to a more recent Nov. 30 poll, there is still a small correlation (r=.3), but not enough to think it was pre-planned.)
By the numbers: The MSNBC debate gave 23 additional seconds to each candidate for each percentage point in their latest poll number, and this totally accounts for the speaking time of each candidate. In the CNN debate, candidates spoke around 12 seconds more per poll percentage point, and while this allocation of time seemed pre-planned, it perhaps was not based entirely/exactly on poll numbers. The Register appeared to allocate time evenly, and any influence of poll numbers on speaking time that there might have been was greatly overshadowed by other factors.
I’m finally tagging this post under “corruption.” Normally we think of corruption as big business influencing the policy of politicians, but here it’s party politics trying to control the media — except I would venture to say that while MSNBC (i.e. General Electric and Microsoft) and CNN (i.e. Time Warner) were happy to play along, The Register (owned by Gannett Co., a major owner of newspapers throughout the country) did things right.
Posted in Open House/Senate Projects | No Comments »
Wednesday, December 12th, 2007
Following up on Ari’s post- At yesterday’s Senate HSGAC hearing, Senator Lieberman noted briefly:
Furthermore Senate votes, unlike House votes, are intentionally presented in a format that limits the public’s ability to examine Senators’ voting records.
I confirmed with HSGAC that Lieberman was indeed referring to making Senate votes available in XML format, like the House does. This is a really important sign, that a Senator has now understood and signed onto the idea of using structured data for something. As I blogged previously, putting Senate votes in XML means independent websites, like my GovTrack, the NYTimes, etc., can more easily create new transformative applications of the data, which helps make the public more informed.
I’m personally quite excited about this, and in no small part because I am pretty sure we can trace back Lieberman’s remark, at least in part, to the work of the Open House Project.
Posted in Open House/Senate Projects | No Comments »
Monday, December 10th, 2007
This weekend an Open Government Working Group conference was held in Sebastopol, CA. It was very useful and productive. I didn’t think that I contributed as much as I should have, personally, but in any case… Sunlight’s Micah Sifry has a good write-up, so I won’t repeat all of those details. (It was great to (finally) meet a number of people- Greg (Palmer), Donny, Larry, Carl, Tom…)
Important links:
A new website www.opengovdata.org came out of it, which has nice announcement text
as well as a wiki wiki.opengovdata.org (which I’m hosting, so blame me for problems) for ongoing discussion on neutral turf.
There’s a Flickr tag with a bunch of photos. You can see that Tim O’Reilly’s big colored sticky note cards played an important role in many sessions.
One of the tangible results of the conference was a set of eight principles for how to determine whether some government data is “open”. It’s similar to how we use criteria elsewhere to determine whether software is open, and also the Open Knowledge Definition. And it was suggested that we develop some sort of branding that we all can make use of to support and point to the principles. The discussion pages linked from some of the terms in the principles are editable wiki pages and do need to be fleshed out with suggestions from anyone.
Also, Dan Newman started some discussion about how to mobilize citizens at large over transparency issues. I am eager to see how that discussion continues— I expect some organizing will happen on the (open) mail list created at the conference (and linked from www.opengovdata.org; yes, yet another mail list…).
Posted in Open House/Senate Projects | No Comments »
Friday, November 16th, 2007
Apparently there was another Democratic debate last night. Based on the transcript analysis by the New York Times and the latest Fox News/Opinion Dynamics poll numbers, I’ve run the numbers again. Last debate, as I blogged, I found that the amount of speaking time of each candidate was ridiculously closely correlated with their latest poll numbers at the time, to the extent that it was impossible to believe that that was not planned. (For stats people, r > .95). That is, MSNBC is skewing the elections and endowing polls (i.e. an easy news source) with more importance by giving more free exposure to the leading candidates.
Yesterday’s debate, a CNN debate, did not show quite as high a correlation (r = .73), with the latest poll numbers at the time of the debate. That’s still quite high. Obama spoke the most, although Clinton still leads by quite a bit in the polls. On the other hand, Obama spoke for more than 3 times as much time as Kucinich, the candidate who spoke the least. The correlation is still implausibly high if we believe the speaking time was intended to be allocated evenly, but perhaps it’s not so high as to believe that CNN used a formula based on poll numbers to decide speaking time for each candidate (as I believe MSNBC did).
Next time MSNBC and CNN have debates, we will start to see whether we can tell from the numbers that MSNBC and CNN have different policies for how they allocate speaking time.
Posted in Open House/Senate Projects | No Comments »
Wednesday, November 14th, 2007
On November 19, 2006, I inquired with the GPO regarding how they decided on charging the public $8,000 for documents they produce in their normal course of printing bills. These documents are not for the end-user, but would be useful for sites like GovTrack. The U.S. Code requires most GPO documents be sold to the public at their marginal cost to distribute, and the marginal cost of distributing the documents I wanted (the “Daily Bills” product with “GPO Locator Codes”) couldn’t be more than $100 a year, and is probably closer to $1.00. Either they aren’t complying with the law, or they don’t consider the documents among those covered by that rule. I wanted to know which.
Today I got an email from GPO’s Lead Customer Service Representative:
Dear Customer: I was updating my database files and notice that your incident was still pending.
Yeah, I would say so. Better late than never.
Posted in Open House/Senate Projects | No Comments »
Monday, November 5th, 2007
(This is written in the style of a letter to the Senate… because hopefully it will turn into just that. Comments on its persuasiveness are welcome.)
Summary: The Senate’s current position on publishing voting records online is analogous to a reference library that has no copy machine. I explain below why the Senate website should publish its roll call vote records in “XML format”, to facilitate educating the public and strengthening transparency, and why any reluctance there may be should be reevaluated in light of the experience from the House’s use of XML for roll call votes and the presence today of unauthoritative XML for Senate votes. Current Senate website policy should be revised to encourage the use of this “structured data format”.
Though everyone believes an electorate must be informed to make wise decisions at the polls, the complexities of what happens in the Congress are indeed difficult to distill and share with the public. Roll call voting records are of crucial importance to the public for obvious reasons, but at the same time fail to capture the nuances of each situation that may have played a central role in a Senator’s decision making. How voting records, which are easy to convey but oversimplify the big picture, should be responsibly shared with the public is a question for debate. I suggest below that the Senate website publish its roll call vote records in “XML format” (in addition to what is currently available) to help keep the public informed, and that any fears about how the information in XML may be used are not strong enough reasons to avoid this technology.
The Senate’s current position on publishing voting records online is analogous to a reference library that has no copy machine. In a reference library without a copy machine, the information in the stacks is certainly made available, but library members can’t easily share the information with others. They can instruct others how to find the information in the library (i.e. a link), and they can copy the information by hand and make copies at Kinkos, but library members are unable to use the latest technology to help them share the information outside the library. In such a world, the library members’ response is likely to be to haul in their own copy machines into the library. This is exactly what has happened with Senate voting records.
Leaving the metaphor, long ago the Senate took the important step of publishing voting records on its website. Though the votes webpages themselves cannot capture all of the nuances of each vote, these webpages complement what exists elsewhere on the web. For instance, the websites of newspapers, which do try to explain the back-story of legislative issues to present a larger picture, often link to the Senate’s roll call webpages as, in a sense, an extension of their own reporting, that is, so they can provide not just the big picture but also the crucial details. The roll call webpages thus have an important role in educating the electorate and promoting transparency.
The metaphorical copy machine represents what is called structured data, for example “XML.” XML allows computers to more easily process information, and for voting records would help that information be disseminated more widely and in novel ways to the public. While structured data is a part of today’s so-called “Web 2.0″, the current policy understood to be coming from Senate Administration is that the Senate website is not to publish structured data for roll call votes, with the reason understood to be that Senators prefer to have their votes be published not as isolated factoids, where they could be misrepresented, but rather only as part of a larger picture.
This policy warrants review on two accounts. On the one hand, even such isolated facts have a crucial role of complementing the larger picture presented elsewhere, as does the existing Senate webpages for votes as explained above. But further, for several years the House has published its voting records in XML. The New York Times, for instance, makes use of these files to enhance their own coverage of legislation by including visual representations of votes along with their articles — the big picture and the crucial details. XML made the voting information more easily transformed into visual form, a form that has educational value to the public, and so using XML is in this respect in the public interest. The Senate does not publish XML, and while as with the metaphorical reference library this does not prevent wholesale access to the information, it is holding back on technology that facilitates educating others. The Senate should adopt a similar policy as the House to encourage the dissemination of voting information, knowing from the experience of the House that it will be used often to complement reporting of the nuances and the big picture.
Because it does not publish votes in XML, the public has hauled in its own copy machine — and the effect is that Senate vote XML files are available to the public, Senate rules notwithstanding. The independent website GovTrack.us publishes its own XML files for Senate votes, and these are used by several other websites to enhance the public’s understanding of the Congress. Any fears Senators might have had for a future with XML can thus be evaluated today. However, this unauthoritative source for voting information is not an optimal solution, on account of the fact that on rare occasions it disseminates incorrect information to some hundreds of thousands of monthly visitors of the websites using these XML files. An authoritative source of roll call vote XML files from the Senate directly would rectify this problem.
As there is virtually no cost to publishing XML files for roll call votes, and in light of the experience that can be gathered from the House’s use of XML and the presence today of (unauthoritative) XML for Senate votes, the current policy regarding the use of structured data on the Senate website should be reevaluated. The use of structured data should be encouraged for all public information on the Senate website, especially starting with roll call votes, and would signal a renewed commitment to using technology to promote transparency.
Posted in Open House/Senate Projects | No Comments »
Friday, November 2nd, 2007
I can’t help but take this a step further. Last post I noted that in Tuesday’s MSNBC Democratic presidential debate, the amount of time spoken by each candidate was correlated ridiculously well with their latest poll numbers, to the extent that it is impossible to believe this was not planned. I don’t know who planned it, but it would seem to me that it is [MS]NBC that had the most to gain. (If the candidates voted on the rules, certainly a majority would not have agreed to such a distribution of time.)
NBC’s (presumed) choice to distribute time is no less than a judgment about who should be president. (And it’s ironic that this would fuel the pundits who come on later to ask “who won” the debate. They should just ask their corporate buddies who they decided to give more screen time to.) Proportioning time is different from cutting out candidates entirely. Not everyone can reasonably fit on a stage or within 2 hours, and a debate with 20 candidates isn’t going to be of particular use to the public. But, given a fixed number of candidates to include, and assuming the public benefits equally from hearing from each, then distributing the time grossly unevenly among the candidates doesn’t serve anyone except those that have something to gain through the election of one candidate or another, and it’s highly presumptuous.
So why would NBC do that? Before the obvious answer, there are two possibilities. The most generous is that the executives believe that the stronger or more likely to win candidates have some claim to more time. Why waste TV time on a candidate who won’t win? But this doesn’t explain the situation. John Edwards is not without hope, but NBC still gave 1.5 times more time to Clinton than to him.
The second possibility is that NBC believes this distribution will get higher ratings for the debate. Actually this isn’t an unreasonable idea. If it’s true that people watch what they want to hear, than people could prefer a debate when their preferred candidate speaks more. Then, proportioning out the time by each’s number of supporters could, in principle, make economic sense. (It’s not obvious that mathematically it does make sense, but you could make up an economic story to make it work.)
The third, cynical possibility is that NBC executives are being swayed by their own personal situations. By limiting the majority of the debate time to a few candidates, they increase the influence of their own campaign contributions to those candidates. I don’t know whether NBC execs contribute particularly differently from the population at large, but here are the numbers from CRP. Looking at donations from self-reported NBC executives of $500 or more to Democrats in the debate, $14,500 went to Obama, $7,600 to Clinton, $2,300 to Dodd, and nothing to anyone else. These numbers are no explanation for the time proportioning (then we would expect Clinton to have received the most), but it does show us that the NBC executives have a personal stake in the top candidates, just like everyone else. And if I were them, I certainly wouldn’t want the candidate I contributed to to be out-debated by an opponent who later goes on to win the White House. Who wants to contribute to a loser?
With either of the last two reasons, there is a large conflict of interest. It’s impossible to get out of it: Time was probably proportioned either to bolster ratings (i.e. playing with politics for money), or to bolster particular candidates (i.e. playing with politics for control).
Posted in Open House/Senate Projects | No Comments »
Wednesday, October 31st, 2007
The New York Times has an interesting flash application that breaks down the text of yesterday’s Democratic debate (there was a debate? UPDATE: And it was in my own city??) by speaker and shows visually the distribution of who spoken when through the debate. I mention it here because it’s one of these data transformations very much in the same spirit of what I keep pushing here. They took the transcript, made it visual and interactive, and the end result is a vastly different view onto the debate than anyone had before. It uses the same transcript as anyone else, but adds something very new and informative.
One can’t help but notice that the different candidates are not getting the same amount of speaking time. Clinton spoke more than 3.5 times more words, and the same for speaking time, than Biden. For that matter, basically so did the moderator, who held the floor for more time than anyone but Clinton. It’s no wonder that Clinton is considered “the Democrat to beat” considering she’s in our face more.
If the numbers weren’t so vastly different between the candidates, we’d chalk it up to some random variation that happens from debate to debate. But, from the numbers, the speaking times are clearly planned. It’s so clear that I feel like maybe I missed something. Is it common knowledge that the debates are proportioning time out to the candidates based on their poll numbers (or something equivalent)? It’s not just that the front-runners are getting more time. The statistical correlation is ridiculously high (speaking time versus FOX News/Opinion Dynamics Poll. Oct. 23-24: r=.96). That is, the debate organizers are basically using this formula to determine how much time each candidate should get:
Speaking Time = 8:26 minutes + 25 seconds * Latest Poll Number (%)
Of course, debate organizers can’t control exactly how long each candidate talks for, but the candidates only deviated from the formula by at most two minutes and twenty seconds (Biden, who spoke less, and DoddCORRECTED: Edwards, who spoke more).
So now I’m getting off topic a bit, but in any case: transformations on data can be very revealing!
Posted in Open House/Senate Projects | No Comments »
Friday, October 26th, 2007
Steve King, a Republican from Iowa, has introduced a new bill that has a clause specifically about Internet-based transparency. (We know King from his bill H.R. 170: Sunlight Act of 2007, parts of which I think were integrated into the passed ethics reform bill. One part that wasn’t integrated was a provision to have bills posted online for 48 hours before their consideration.) His new bill is H. Res. 776: Amending the Rules of the House of Representatives to require that rescission bills always be considered under open rules every year, and for other purposes.
This bill, like most of the 12 others he has introduced this year, takes a classical conservative position, here trying to reduce government spending. The real point of the bill is expressed best in one of its findings clauses:
Whereas a rescissions bill, which would cut Federal spending, should be brought to the House floor at the beginning of every fiscal quarter to give Congress the opportunity to cut and cancel unnecessary, wasteful, and bloated government spending to eliminate the deficit;
But the interesting part for us is:
Whereas the process of cutting spending should be open to the public, by posting this spending cutting bill and its amendments on the Internet, so that Americans can exercise their right to contact their Members of Congress and make their views known
It has a variant of the 48-hours language from his other bill applied specifically to rescission bills.
Posted in Open House/Senate Projects | No Comments »
|