<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Joshua Tauberer's Blog</title>
	<atom:link href="http://razor.occams.info/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://razor.occams.info/blog</link>
	<description></description>
	<pubDate>Fri, 22 Aug 2008 23:27:31 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7-bleeding</generator>
	<language>en</language>
			<item>
		<title>Navigating legislation (after the fact, of course)</title>
		<link>http://razor.occams.info/blog/2008/08/22/navigating-legislation-after-the-fact-of-course/</link>
		<comments>http://razor.occams.info/blog/2008/08/22/navigating-legislation-after-the-fact-of-course/#comments</comments>
		<pubDate>Fri, 22 Aug 2008 23:27:31 +0000</pubDate>
		<dc:creator>Joshua Tauberer</dc:creator>
		
		<category><![CDATA[Open House Project]]></category>

		<guid isPermaLink="false">http://razor.occams.info/blog/?p=342</guid>
		<description><![CDATA[<p>In May, the Congress passed the 2008 Farm Bill, which regulates various food, nutrition, and apparently biofuel issues. Tufts food policy professor Parke Wilde <a href="http://usfoodpolicy.blogspot.com/2008/08/ers-posts-farm-bill-side-by-side.html">writes on his blog</a> today:</p>
<blockquote><p>The 629-page text (.pdf) of the 2008 Farm Bill is so complex and unreadable that the U.S. food policy community has been on the edge of our seats waiting for the USDA/ERS side-by-side comparison unveiled today.</p>
<p>The <a href="http://www.ers.usda.gov/FarmBill/2008/">ERS  side-by-side tool</a> compares the new Farm Bill with current law, title by title, so we can finally begin to understand what the law really means.</p></blockquote>
<p>ERS is the USDA&#8217;s Economic Research Service. Their side-by-side webpage, which I think was just published this week, shows the provisions of the previous and the current bill side-by-side. (It&#8217;s not a comparison of the bill text, but of summaries of the provisions.)</p>
<p>This is interesting on a number of accounts. First, the fact that it is the USDA making this comparison suggests that everyone agrees that the bill itself is effectively incomprehensible even to professionals and scholars on account of its size and summarizing it is costly enough that only the government would do it, taking three months to prepare.</p>
<p>Second, if this is what was needed to understand the Farm Bill, was it passed without anyone understanding it?</p>
<p>Third- This comparison was made by and for professionals and scholars, not by tech geeks. Why aren&#8217;t we talking to them?</p>
<p>The ERS tool comes complete with a seemingly unintentionally hilarious <a href="http://www.ers.usda.gov/FarmBill/2008/video/FarmBillVideo.htm">intro video</a> &#8212; overly dramatic with background music fit for the Miss Universe competition. (Wilde likened it to &#8220;a documentary by Kenneth Burns or an account of a manned mission to the moon&#8221;.)</p>
]]></description>
			<content:encoded><![CDATA[<p>In May, the Congress passed the 2008 Farm Bill, which regulates various food, nutrition, and apparently biofuel issues. Tufts food policy professor Parke Wilde <a href="http://usfoodpolicy.blogspot.com/2008/08/ers-posts-farm-bill-side-by-side.html">writes on his blog</a> today:</p>
<blockquote><p>The 629-page text (.pdf) of the 2008 Farm Bill is so complex and unreadable that the U.S. food policy community has been on the edge of our seats waiting for the USDA/ERS side-by-side comparison unveiled today.</p>
<p>The <a href="http://www.ers.usda.gov/FarmBill/2008/">ERS  side-by-side tool</a> compares the new Farm Bill with current law, title by title, so we can finally begin to understand what the law really means.</p></blockquote>
<p>ERS is the USDA&#8217;s Economic Research Service. Their side-by-side webpage, which I think was just published this week, shows the provisions of the previous and the current bill side-by-side. (It&#8217;s not a comparison of the bill text, but of summaries of the provisions.)</p>
<p>This is interesting on a number of accounts. First, the fact that it is the USDA making this comparison suggests that everyone agrees that the bill itself is effectively incomprehensible even to professionals and scholars on account of its size and summarizing it is costly enough that only the government would do it, taking three months to prepare.</p>
<p>Second, if this is what was needed to understand the Farm Bill, was it passed without anyone understanding it?</p>
<p>Third- This comparison was made by and for professionals and scholars, not by tech geeks. Why aren&#8217;t we talking to them?</p>
<p>The ERS tool comes complete with a seemingly unintentionally hilarious <a href="http://www.ers.usda.gov/FarmBill/2008/video/FarmBillVideo.htm">intro video</a> &#8212; overly dramatic with background music fit for the Miss Universe competition. (Wilde likened it to &#8220;a documentary by Kenneth Burns or an account of a manned mission to the moon&#8221;.)</p>
]]></content:encoded>
			<wfw:commentRss>http://razor.occams.info/blog/2008/08/22/navigating-legislation-after-the-fact-of-course/feed/</wfw:commentRss>
		</item>
		<item>
		<title>The Semantic Web&#8217;s Role in Dealing with Disasters</title>
		<link>http://razor.occams.info/blog/2008/08/12/the-semantic-webs-role-in-dealing-with-disasters/</link>
		<comments>http://razor.occams.info/blog/2008/08/12/the-semantic-webs-role-in-dealing-with-disasters/#comments</comments>
		<pubDate>Tue, 12 Aug 2008 15:26:51 +0000</pubDate>
		<dc:creator>Joshua Tauberer</dc:creator>
		
		<category><![CDATA[Semantic Web]]></category>

		<guid isPermaLink="false">http://razor.occams.info/blog/?p=340</guid>
		<description><![CDATA[<p>My <a href="http://rdfabout.com/demo/census">Census RDF dataset</a> is being used in a public health project:</p>
<p>On SemanticWeb.com: <a href="http://www.semanticweb.com/article.php/3764266">http://www.semanticweb.com/article.php/3764266<br />
</a><br />
The Semantic Web&#8217;s Role in Dealing with Disasters<br />
August 8, 2008<br />
By Jennifer Zaino</p>
<p>The University of Southern California Information Sciences Institute and Childrens Hospital Los Angeles have been working together to build a software tool. Dubbed PEDSS (Pediatric Emergency Decision Support System), the tool is designed to help medical service providers more effectively plan for, train for, and respond to serious incidents and disasters affecting children.</p>
<p>The project, a part of the Pediatric Disaster Resource and Training Center (PDRTC), has been going on for about eight months.</p>
<p>Dr. Tatyana Ryutov, a research scientist at the USC Information Sciences Institute, is working on the system. Recently, the Institute contacted Joshua Tauberer, the creator of Govtrak.us and the man who maintains a large RDF (Resource Description Framework) data set of U.S. Census data, about making SPAQRL queries to that data in conjunction with the PEDSS.</p>
<p>&#8230;</p>
<p>“Currently, demographic data (number of children in four age groups) is entered manually. We want the tool to calculate this information automatically based on a zip-code. Therefore, we extend the tool to query the RDF census data server to get this information,” Ryutov writes. Currently this is the only server the software queries, but Ryutov says they plan to add calls to other census data servers to improve reliability. Those servers do not have to be RDF databases.</p>
<p>(and it continues)</p>
]]></description>
			<content:encoded><![CDATA[<p>My <a href="http://rdfabout.com/demo/census">Census RDF dataset</a> is being used in a public health project:</p>
<p>On SemanticWeb.com: <a href="http://www.semanticweb.com/article.php/3764266">http://www.semanticweb.com/article.php/3764266<br />
</a><br />
The Semantic Web&#8217;s Role in Dealing with Disasters<br />
August 8, 2008<br />
By Jennifer Zaino</p>
<p>The University of Southern California Information Sciences Institute and Childrens Hospital Los Angeles have been working together to build a software tool. Dubbed PEDSS (Pediatric Emergency Decision Support System), the tool is designed to help medical service providers more effectively plan for, train for, and respond to serious incidents and disasters affecting children.</p>
<p>The project, a part of the Pediatric Disaster Resource and Training Center (PDRTC), has been going on for about eight months.</p>
<p>Dr. Tatyana Ryutov, a research scientist at the USC Information Sciences Institute, is working on the system. Recently, the Institute contacted Joshua Tauberer, the creator of Govtrak.us and the man who maintains a large RDF (Resource Description Framework) data set of U.S. Census data, about making SPAQRL queries to that data in conjunction with the PEDSS.</p>
<p>&#8230;</p>
<p>“Currently, demographic data (number of children in four age groups) is entered manually. We want the tool to calculate this information automatically based on a zip-code. Therefore, we extend the tool to query the RDF census data server to get this information,” Ryutov writes. Currently this is the only server the software queries, but Ryutov says they plan to add calls to other census data servers to improve reliability. Those servers do not have to be RDF databases.</p>
<p>(and it continues)</p>
]]></content:encoded>
			<wfw:commentRss>http://razor.occams.info/blog/2008/08/12/the-semantic-webs-role-in-dealing-with-disasters/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Berlin SPARQL Benchmarks for my SemWeb .NET Library</title>
		<link>http://razor.occams.info/blog/2008/08/10/berlin-sparql-benchmarks-for-my-semweb-net-library/</link>
		<comments>http://razor.occams.info/blog/2008/08/10/berlin-sparql-benchmarks-for-my-semweb-net-library/#comments</comments>
		<pubDate>Sun, 10 Aug 2008 12:47:22 +0000</pubDate>
		<dc:creator>Joshua Tauberer</dc:creator>
		
		<category><![CDATA[Semantic Web]]></category>

		<guid isPermaLink="false">http://razor.occams.info/blog/?p=337</guid>
		<description><![CDATA[<p>Chris Bizer and team have posted a <a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/ ">benchmark specification for SPARQL endpoints</a>, the Berlin SPARQL Benchmark (BSBM). They have &#8220;run the initial version of the benchmark against Sesame,  Virtuoso, Jena SDB  and against D2R Server, a relational  database-to-RDF wrapper. The stores were benchmarked with datasets  ranging from 50,000 triples to 100,000,000 triples&#8221; (announcement email).</p>
<p>I ran the benchmark against my <a href="http://razor.occams.info/code/semweb">SemWeb .NET library</a>. Instructions for setting up the benchmark are <a href="http://razor.occams.info/code/semweb/semweb-current/doc/bsbm.html">here</a> and turned out to be a good example for how very quickly to set up a SPARQL endpoint using my library, backed with your SQL database of choice (in this case MySQL). I had some trouble the first time I ran the benchmark though:</p>
<ul>
<li>The first time I ran the tests I found the library had several bugs/limitations: a bug preveting ORDER BY with dataTime values, an error parsing function calls in FILTER expressions, and a glitch in the translation of the query to SQL. I corrected these problems.</li>
<li>Query 10 must be modified to change the ordering to ORDER BY xsd:double(str(?price)), which adds the cast xsd:double(str(&#8230;)), since ordering by the custom USD datatype is not supported and not required to be supported by the SPARQL specification.</li>
<li>In the same query, in FILTER (?date &gt; &#8220;2008-06-20&#8243;^^<a class="moz-txt-link-rfc2396E" href="http://www.w3.org/2001/XMLSchema#date">&lt;http://www.w3.org/2001/XMLSchema#date&gt;</a> ), xsd:date comparisons are not a part of the SPARQL spec (as I understand it; dateTime comparisons on the other hand <em>are</em> required by the spec). Such comparisons weren&#8217;t implemented in my library, but I went ahead and  added it.</li>
</ul>
<p>Also I have some concerns. First, I am not 100% sure if the results of my library are actually correct. Query 4 seemed to always return no results. Second, queries are largely translated into SQL, and there is a good deal of caching going on at the level of MySQL. The benchmark results then are saying a lot about the best-case run time, and indicate something about the overhead of SPARQL processing, but may not indicate general use performance.</p>
<p>Benchmark results reported below are for my desktop: Intel Core2 Duo at 3.00GHz, 2 GB RAM, 32bit Ubuntu 8.04 on Linux 2.6.24-19-generic, Java 1.6.0_06 for the benchmark tools, and Mono 1.9.1. This seems roughly comparable to the machine used in the BSBM.</p>
<p>Load time (in seconds and triples/sec) is reported below for some of the different data set sizes.</p>
<table style="text-align: center;" border="1" cellspacing="1" cellpadding="8">
<tbody>
<tr>
<th></th>
<th>50K</th>
<th>250K</th>
<th>1M</th>
<th>5M</th>
<th>25M</th>
</tr>
<tr>
<td>Time (sec)</td>
<td></td>
<td></td>
<td>224</td>
<td></td>
<td>16129<!--(4:28:49)--></td>
</tr>
<tr>
<td>triples/sec</td>
<td></td>
<td></td>
<td>4441</td>
<td></td>
<td>1544</td>
</tr>
</tbody>
</table>
<p>For comparison, load time for the 1M data set was 224 seconds. This is about double-to-2.5 times (worse) the time of Jena SDB (Hash) with MySQL over Joseki3 (117s) and Virtuoso Open-Source Edition v5.0.6 and v5.0.7 (87s), as reported in the BSBM results. For the larger 25M dataset, the load time at 4.5 hours was only 1.2 times slower than Jena SDB but 1.7 times faster than Sesame over Tomcat and 3 times faster than Virtuoso. (But, again, the machines were different.)</p>
<p>Results for query execution are reported below. AQET (Average Query Execution Time, in seconds) is reported below for each of the queries for different data set sizes. The results were roughly comparable again to Jena and Virtuoso. But, again, the three caveats above are worth restating: the query results are not validated to be known to be correct, there is significant caching, and the machine was different than the machine used in BSBM.</p>
<table style="text-align: center;" border="1" cellspacing="1" cellpadding="8">
<tbody>
<tr>
<th></th>
<th>50K</th>
<th>250K</th>
<th>1M</th>
<th>5M</th>
<th>25M</th>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ1">Query 1</a></strong></td>
<td></td>
<td></td>
<td>0.019184</td>
<td></td>
<td>0.049200</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ2">Query 2</a></strong></td>
<td></td>
<td></td>
<td>0.051187</td>
<td></td>
<td>0.048590</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ3">Query 3</a></strong></td>
<td></td>
<td></td>
<td>0.030508</td>
<td></td>
<td>0.079187</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ4">Query 4</a></strong></td>
<td></td>
<td></td>
<td>0.032693</td>
<td></td>
<td>0.075603</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ5">Query 5</a></strong></td>
<td></td>
<td></td>
<td>0.172283</td>
<td></td>
<td>0.342828</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ6">Query 6</a></strong></td>
<td></td>
<td></td>
<td>0.102105</td>
<td></td>
<td>3.277656</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ7">Query 7</a></strong></td>
<td></td>
<td></td>
<td>0.256491</td>
<td></td>
<td>1.108414</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ8">Query 8</a></strong></td>
<td></td>
<td></td>
<td>0.175357</td>
<td></td>
<td>0.572258</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ9">Query 9</a></strong></td>
<td></td>
<td></td>
<td>0.059674</td>
<td></td>
<td>0.088451</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ10">Query 10</a></strong></td>
<td></td>
<td></td>
<td>0.089215</td>
<td></td>
<td>0.322246</td>
</tr>
</tbody>
</table>
]]></description>
			<content:encoded><![CDATA[<p>Chris Bizer and team have posted a <a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/ ">benchmark specification for SPARQL endpoints</a>, the Berlin SPARQL Benchmark (BSBM). They have &#8220;run the initial version of the benchmark against Sesame,  Virtuoso, Jena SDB  and against D2R Server, a relational  database-to-RDF wrapper. The stores were benchmarked with datasets  ranging from 50,000 triples to 100,000,000 triples&#8221; (announcement email).</p>
<p>I ran the benchmark against my <a href="http://razor.occams.info/code/semweb">SemWeb .NET library</a>. Instructions for setting up the benchmark are <a href="http://razor.occams.info/code/semweb/semweb-current/doc/bsbm.html">here</a> and turned out to be a good example for how very quickly to set up a SPARQL endpoint using my library, backed with your SQL database of choice (in this case MySQL). I had some trouble the first time I ran the benchmark though:</p>
<ul>
<li>The first time I ran the tests I found the library had several bugs/limitations: a bug preveting ORDER BY with dataTime values, an error parsing function calls in FILTER expressions, and a glitch in the translation of the query to SQL. I corrected these problems.</li>
<li>Query 10 must be modified to change the ordering to ORDER BY xsd:double(str(?price)), which adds the cast xsd:double(str(&#8230;)), since ordering by the custom USD datatype is not supported and not required to be supported by the SPARQL specification.</li>
<li>In the same query, in FILTER (?date &gt; &#8220;2008-06-20&#8243;^^<a class="moz-txt-link-rfc2396E" href="http://www.w3.org/2001/XMLSchema#date">&lt;http://www.w3.org/2001/XMLSchema#date&gt;</a> ), xsd:date comparisons are not a part of the SPARQL spec (as I understand it; dateTime comparisons on the other hand <em>are</em> required by the spec). Such comparisons weren&#8217;t implemented in my library, but I went ahead and  added it.</li>
</ul>
<p>Also I have some concerns. First, I am not 100% sure if the results of my library are actually correct. Query 4 seemed to always return no results. Second, queries are largely translated into SQL, and there is a good deal of caching going on at the level of MySQL. The benchmark results then are saying a lot about the best-case run time, and indicate something about the overhead of SPARQL processing, but may not indicate general use performance.</p>
<p>Benchmark results reported below are for my desktop: Intel Core2 Duo at 3.00GHz, 2 GB RAM, 32bit Ubuntu 8.04 on Linux 2.6.24-19-generic, Java 1.6.0_06 for the benchmark tools, and Mono 1.9.1. This seems roughly comparable to the machine used in the BSBM.</p>
<p>Load time (in seconds and triples/sec) is reported below for some of the different data set sizes.</p>
<table style="text-align: center;" border="1" cellspacing="1" cellpadding="8">
<tbody>
<tr>
<th></th>
<th>50K</th>
<th>250K</th>
<th>1M</th>
<th>5M</th>
<th>25M</th>
</tr>
<tr>
<td>Time (sec)</td>
<td></td>
<td></td>
<td>224</td>
<td></td>
<td>16129<!--(4:28:49)--></td>
</tr>
<tr>
<td>triples/sec</td>
<td></td>
<td></td>
<td>4441</td>
<td></td>
<td>1544</td>
</tr>
</tbody>
</table>
<p>For comparison, load time for the 1M data set was 224 seconds. This is about double-to-2.5 times (worse) the time of Jena SDB (Hash) with MySQL over Joseki3 (117s) and Virtuoso Open-Source Edition v5.0.6 and v5.0.7 (87s), as reported in the BSBM results. For the larger 25M dataset, the load time at 4.5 hours was only 1.2 times slower than Jena SDB but 1.7 times faster than Sesame over Tomcat and 3 times faster than Virtuoso. (But, again, the machines were different.)</p>
<p>Results for query execution are reported below. AQET (Average Query Execution Time, in seconds) is reported below for each of the queries for different data set sizes. The results were roughly comparable again to Jena and Virtuoso. But, again, the three caveats above are worth restating: the query results are not validated to be known to be correct, there is significant caching, and the machine was different than the machine used in BSBM.</p>
<table style="text-align: center;" border="1" cellspacing="1" cellpadding="8">
<tbody>
<tr>
<th></th>
<th>50K</th>
<th>250K</th>
<th>1M</th>
<th>5M</th>
<th>25M</th>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ1">Query 1</a></strong></td>
<td></td>
<td></td>
<td>0.019184</td>
<td></td>
<td>0.049200</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ2">Query 2</a></strong></td>
<td></td>
<td></td>
<td>0.051187</td>
<td></td>
<td>0.048590</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ3">Query 3</a></strong></td>
<td></td>
<td></td>
<td>0.030508</td>
<td></td>
<td>0.079187</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ4">Query 4</a></strong></td>
<td></td>
<td></td>
<td>0.032693</td>
<td></td>
<td>0.075603</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ5">Query 5</a></strong></td>
<td></td>
<td></td>
<td>0.172283</td>
<td></td>
<td>0.342828</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ6">Query 6</a></strong></td>
<td></td>
<td></td>
<td>0.102105</td>
<td></td>
<td>3.277656</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ7">Query 7</a></strong></td>
<td></td>
<td></td>
<td>0.256491</td>
<td></td>
<td>1.108414</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ8">Query 8</a></strong></td>
<td></td>
<td></td>
<td>0.175357</td>
<td></td>
<td>0.572258</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ9">Query 9</a></strong></td>
<td></td>
<td></td>
<td>0.059674</td>
<td></td>
<td>0.088451</td>
</tr>
<tr>
<td><strong><a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html#queryTripleQ10">Query 10</a></strong></td>
<td></td>
<td></td>
<td>0.089215</td>
<td></td>
<td>0.322246</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://razor.occams.info/blog/2008/08/10/berlin-sparql-benchmarks-for-my-semweb-net-library/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Return from Blogging Hiatus</title>
		<link>http://razor.occams.info/blog/2008/08/06/return-from-blogging-hiatus/</link>
		<comments>http://razor.occams.info/blog/2008/08/06/return-from-blogging-hiatus/#comments</comments>
		<pubDate>Wed, 06 Aug 2008 13:11:19 +0000</pubDate>
		<dc:creator>Joshua Tauberer</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://razor.occams.info/blog/?p=335</guid>
		<description><![CDATA[<p>Actually a lie. I&#8217;ve just been blogging elsewhere. I&#8217;ve fetched all of my old blog archives and my Open House Project posts from over there and re-made this blog.</p>
]]></description>
			<content:encoded><![CDATA[<p>Actually a lie. I&#8217;ve just been blogging elsewhere. I&#8217;ve fetched all of my old blog archives and my Open House Project posts from over there and re-made this blog.</p>
]]></content:encoded>
			<wfw:commentRss>http://razor.occams.info/blog/2008/08/06/return-from-blogging-hiatus/feed/</wfw:commentRss>
		</item>
		<item>
		<title>oGosh! IRC Meeting Aug 16 4pm EDT</title>
		<link>http://razor.occams.info/blog/2008/08/06/ogosh-irc-meeting-aug-16-4pm-edt/</link>
		<comments>http://razor.occams.info/blog/2008/08/06/ogosh-irc-meeting-aug-16-4pm-edt/#comments</comments>
		<pubDate>Wed, 06 Aug 2008 12:21:48 +0000</pubDate>
		<dc:creator>Joshua Tauberer</dc:creator>
		
		<category><![CDATA[GovTrack]]></category>

		<category><![CDATA[Mono]]></category>

		<guid isPermaLink="false">http://razor.occams.info/blog/?p=243</guid>
		<description><![CDATA[<p>Join me at an IRC chat to talk about open source civic technology projects,  on Saturday, August 16 at 4pm Eastern time! The agenda will be a mix  between seeing what various civic technology projects are up to like <a href="http://www.govtrack.us">GovTrack</a> (my site, powered by Mono), <a href="http://www.opencongress.org">OpenCongress</a>, and any others run by people who show up, and  getting new people involved in ongoing projects. &#8220;oGosh&#8221; is Open Government Open Source Hacking (<a href="http://wiki.opengovdata.org/index.php/OGosh">wiki</a> | <a href="http://www.new.facebook.com/home.php#/group.php?gid=45606565313">Facebook</a>), what I&#8217;m calling the  loose community that binds these projects together.</p>
<p>The chat will be in the #transparency channel on Freenode. For more  information on the meeting (and on how to get to the chat), see <a class="moz-txt-link-freetext" href="http://wiki.opengovdata.org/index.php/OGosh">http://wiki.opengovdata.org/index.php/OGosh</a>.</p>
<p>Suggestions for agenda topics are most welcome either to me directly or  by revising the wiki page above. Hope to see you there.</p>
]]></description>
			<content:encoded><![CDATA[<p>Join me at an IRC chat to talk about open source civic technology projects,  on Saturday, August 16 at 4pm Eastern time! The agenda will be a mix  between seeing what various civic technology projects are up to like <a href="http://www.govtrack.us">GovTrack</a> (my site, powered by Mono), <a href="http://www.opencongress.org">OpenCongress</a>, and any others run by people who show up, and  getting new people involved in ongoing projects. &#8220;oGosh&#8221; is Open Government Open Source Hacking (<a href="http://wiki.opengovdata.org/index.php/OGosh">wiki</a> | <a href="http://www.new.facebook.com/home.php#/group.php?gid=45606565313">Facebook</a>), what I&#8217;m calling the  loose community that binds these projects together.</p>
<p>The chat will be in the #transparency channel on Freenode. For more  information on the meeting (and on how to get to the chat), see <a class="moz-txt-link-freetext" href="http://wiki.opengovdata.org/index.php/OGosh">http://wiki.opengovdata.org/index.php/OGosh</a>.</p>
<p>Suggestions for agenda topics are most welcome either to me directly or  by revising the wiki page above. Hope to see you there.</p>
]]></content:encoded>
			<wfw:commentRss>http://razor.occams.info/blog/2008/08/06/ogosh-irc-meeting-aug-16-4pm-edt/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Legislative Databases recommendation makes it to House Leg Branch Appropriations markup</title>
		<link>http://razor.occams.info/blog/2008/07/14/legislative-databases-recommendation-makes-it-to-house-leg-branch-appropriations-markup/</link>
		<comments>http://razor.occams.info/blog/2008/07/14/legislative-databases-recommendation-makes-it-to-house-leg-branch-appropriations-markup/#comments</comments>
		<pubDate>Mon, 14 Jul 2008 17:42:28 +0000</pubDate>
		<dc:creator>Joshua Tauberer</dc:creator>
		
		<category><![CDATA[Open House Project]]></category>

		<guid isPermaLink="false">http://www.theopenhouseproject.com/?p=375</guid>
		<description><![CDATA[<p>I&#8217;m ecstatic. All right, so this all goes back to late 2006, a bunch of people sitting at their computers writing some emails about what Congress <em>should</em> do with data. I distinctly remember Dan Newman and I both thinking that the Library of Congress should make its raw legislative database (that powers THOMAS) available directly to us to build applications off of, rather than the screen-scraping that I was doing. One thing leads to another, the Open House Project, <a href="http://http://www.theopenhouseproject.com/the-open-house-project-report/3-legislation-database/">the legislative databases section of the OHP report</a> in May 2007 (which I principally wrote), then later that year with the support of Rep. Mike Honda, in November <a href="http://www.govexec.com/story_page.cfm?filepath=/dailyfed/0108/012308tdpm1.htm">CHA asked the LOC to look into the issue</a> (<a href="http://www.theopenhouseproject.com/2008/02/01/congressman-honda-on-the-open-house-cause/">more</a>), and then in the last month his office submitted text for the House Legislative Branch Appropriations Report, which made it through subcommittee markup of the bill, to give this request a little more teeth (like, ehm, the force of law).</p>
<p>His office also submitted a second paragraph which I&#8217;ll get to below.<br />
<span id="more-375"></span><br />
Rob Pierson in Honda&#8217;s office writes on the OHP mail list:</p>
<blockquote><p>I&#8217;ve mentioned on the list some of the steps my boss (Congressman Honda) has been taking, with counsel from many folks on this list, to guide Congressional policies on the path towards effectively leveraging technology to open up access to the public. There are actually quite a few other staffers who also follow this list, and we&#8217;ve certainly learned quite a bit from the conversations posted here, so I wanted to throw out a quick note of appreciation to everyone who has been contributing to the discussions.</p>
<p>With guidance from the conversations on this list (and the OHP report), Congressman Honda recently submitted the following sections into the House Legislative Branch Appropriations Report. The following (or possibly very similar versions) were included in the Leg Branch Subcommittee markup of the bill:</p>
<p>*Public Access to Legislative Data (as submitted)*</p>
<p>The Committee believes that the public should have improved access to legislative information through more advanced search capabilities such as those available through the Library of Congress&#8217; Legislative<br />
Information System. The Committee also supports enhancing public access to legislative documents, bill status, summary information, and other legislative data, through more direct methods such as bulk data downloads and other means of no-charge digital access to legislative databases. The Committee requests that the Library and Government Printing Office report on the progress towards these goals within 90 days of enactment of this Act.
</p>
</blockquote>
<p>Note that the GPO has also been stuck in there. More more on that, <a href="http://www.theopenhouseproject.com/2007/11/14/better-late-than-never-gpo-responds-to-my-question-1-year-later/">see this post</a>.</p>
<p>The second paragraph that Honda&#8217;s office submitted John noted was parallel to the final chapter of our report, <a href="http://www.theopenhouseproject.com/the-open-house-project-report/12-coordinating-web-standards/">Coordinating Web Standards</a>. (Hmm, I principally wrote that chapter too&#8230;.)</p>
<blockquote><p>*Congressional Technology Coordination (as submitted)*</p>
<p>The Committee recognizes the need for the House of Representatives to develop a strategic and coordinated plan that will prepare for the future technology needs of the institution.  A 2006 report commissioned by the Chief Administrative Officer and the Committee on House Administration, entitled /Strategic Technology Road Map for the Ten Year Vision of Technology in the House of Representatives/ provided a suggested structure for an IT evaluation and decision-making process.<br />
No later than 90 days after the enactment of this Act, the Committee requests that the Chief Administrative Officer, the Clerk, and the Sergeant at Arms report to the Committee of their efforts to develop House-wide data-sharing standards; implement standard legislative document formats; address the increasing resource challenges of Member offices; and identify disparate systems throughout the institution, which prevent it from taking advantage of economies of scale.</p>
</blockquote>
<p>This is of course fantastic news for anyone that supports transparency, which is, well, everyone in their right mind, I think. So thanks to Congressman Honda for taking the initiative on this!</p>
<p>(Other links: <a href="http://www.theopenhouseproject.com/2007/06/27/house-leg-branch-appropriations-review/">last year&#8217;s leg branch appropriations blog post</a>, <a href="http://www.theopenhouseproject.com/2007/01/25/mash-ups-for-government-transparency/">my first or one of my first posts here about structured data</a>)</p>
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m ecstatic. All right, so this all goes back to late 2006, a bunch of people sitting at their computers writing some emails about what Congress <em>should</em> do with data. I distinctly remember Dan Newman and I both thinking that the Library of Congress should make its raw legislative database (that powers THOMAS) available directly to us to build applications off of, rather than the screen-scraping that I was doing. One thing leads to another, the Open House Project, <a href="http://http://www.theopenhouseproject.com/the-open-house-project-report/3-legislation-database/">the legislative databases section of the OHP report</a> in May 2007 (which I principally wrote), then later that year with the support of Rep. Mike Honda, in November <a href="http://www.govexec.com/story_page.cfm?filepath=/dailyfed/0108/012308tdpm1.htm">CHA asked the LOC to look into the issue</a> (<a href="http://www.theopenhouseproject.com/2008/02/01/congressman-honda-on-the-open-house-cause/">more</a>), and then in the last month his office submitted text for the House Legislative Branch Appropriations Report, which made it through subcommittee markup of the bill, to give this request a little more teeth (like, ehm, the force of law).</p>
<p>His office also submitted a second paragraph which I&#8217;ll get to below.<br />
<span id="more-375"></span><br />
Rob Pierson in Honda&#8217;s office writes on the OHP mail list:</p>
<blockquote><p>I&#8217;ve mentioned on the list some of the steps my boss (Congressman Honda) has been taking, with counsel from many folks on this list, to guide Congressional policies on the path towards effectively leveraging technology to open up access to the public. There are actually quite a few other staffers who also follow this list, and we&#8217;ve certainly learned quite a bit from the conversations posted here, so I wanted to throw out a quick note of appreciation to everyone who has been contributing to the discussions.</p>
<p>With guidance from the conversations on this list (and the OHP report), Congressman Honda recently submitted the following sections into the House Legislative Branch Appropriations Report. The following (or possibly very similar versions) were included in the Leg Branch Subcommittee markup of the bill:</p>
<p>*Public Access to Legislative Data (as submitted)*</p>
<p>The Committee believes that the public should have improved access to legislative information through more advanced search capabilities such as those available through the Library of Congress&#8217; Legislative<br />
Information System. The Committee also supports enhancing public access to legislative documents, bill status, summary information, and other legislative data, through more direct methods such as bulk data downloads and other means of no-charge digital access to legislative databases. The Committee requests that the Library and Government Printing Office report on the progress towards these goals within 90 days of enactment of this Act.
</p>
</blockquote>
<p>Note that the GPO has also been stuck in there. More more on that, <a href="http://www.theopenhouseproject.com/2007/11/14/better-late-than-never-gpo-responds-to-my-question-1-year-later/">see this post</a>.</p>
<p>The second paragraph that Honda&#8217;s office submitted John noted was parallel to the final chapter of our report, <a href="http://www.theopenhouseproject.com/the-open-house-project-report/12-coordinating-web-standards/">Coordinating Web Standards</a>. (Hmm, I principally wrote that chapter too&#8230;.)</p>
<blockquote><p>*Congressional Technology Coordination (as submitted)*</p>
<p>The Committee recognizes the need for the House of Representatives to develop a strategic and coordinated plan that will prepare for the future technology needs of the institution.  A 2006 report commissioned by the Chief Administrative Officer and the Committee on House Administration, entitled /Strategic Technology Road Map for the Ten Year Vision of Technology in the House of Representatives/ provided a suggested structure for an IT evaluation and decision-making process.<br />
No later than 90 days after the enactment of this Act, the Committee requests that the Chief Administrative Officer, the Clerk, and the Sergeant at Arms report to the Committee of their efforts to develop House-wide data-sharing standards; implement standard legislative document formats; address the increasing resource challenges of Member offices; and identify disparate systems throughout the institution, which prevent it from taking advantage of economies of scale.</p>
</blockquote>
<p>This is of course fantastic news for anyone that supports transparency, which is, well, everyone in their right mind, I think. So thanks to Congressman Honda for taking the initiative on this!</p>
<p>(Other links: <a href="http://www.theopenhouseproject.com/2007/06/27/house-leg-branch-appropriations-review/">last year&#8217;s leg branch appropriations blog post</a>, <a href="http://www.theopenhouseproject.com/2007/01/25/mash-ups-for-government-transparency/">my first or one of my first posts here about structured data</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://razor.occams.info/blog/2008/07/14/legislative-databases-recommendation-makes-it-to-house-leg-branch-appropriations-markup/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Eating well on Independence Day</title>
		<link>http://razor.occams.info/blog/2008/07/04/eating-well-on-independence-day/</link>
		<comments>http://razor.occams.info/blog/2008/07/04/eating-well-on-independence-day/#comments</comments>
		<pubDate>Fri, 04 Jul 2008 12:54:03 +0000</pubDate>
		<dc:creator>Joshua Tauberer</dc:creator>
		
		<category><![CDATA[Open House Project]]></category>

		<guid isPermaLink="false">http://www.theopenhouseproject.com/?p=370</guid>
		<description><![CDATA[<p>Happy 4th of July. I thought I&#8217;d share an interesting website that has nothing to do with government transparency but is about good use of government data. The USDA maintains a <a href="http://www.ars.usda.gov/main/site_main.htm?modecode=12354500">big database of nutrition facts about foods</a>. You can download the database and build applications based on it, like a menu planner. This is something I&#8217;ve been thinking about in the back of my head for a while since after getting into the whole <a href="http://www.michaelpollan.com/">Michael Pollan food mind-set</a> I&#8217;ve wondered whether one can make a healthy diet just by balancing various food groups (as I try to do with limited success), or whether (contra Pollan&#8217;s overall message, though maybe not in the details) it would be useful to start adding up the numbers of various nutrients to see how my meals match up with recommended values. How should I know, for instance, if I&#8217;ve managed to exclude an important vitamin in my particular selection of foods that I eat week after week, right?</p>
<p>The database is great itself, but the cooler website is <a href="http://www.mypyramidtracker.gov/planner/">MyPyramid Menu Planner (mypyramidtracker.gov)</a> (also out of the USDA). You can enter a typical daily roster of what you eat (with a nice sound effect) and it will tell you how it stacks up for a recommended diet for your age (or for me, how to gain weight to a recommended amount for my age). It feels a little over-simplified, but the simplicity keeps me on the site. I find, not surprisingly, that I probably eat about half of the recommended calories and clearly not enough grain or fruit. Well, I knew this in the abstract, but quantifying it helps direct me to fixing the problem.</p>
<p>I&#8217;m sure there are other websites that do similar things, but it&#8217;s nice to find a case where the government has both published a comprehensive (well structured, well documented) database and has also built a really nice interface for the data. And on a topic that is really very important to daily life, too.</p>
<p>And with that, I think I will take the rest of the weekend off from civics!</p>
]]></description>
			<content:encoded><![CDATA[<p>Happy 4th of July. I thought I&#8217;d share an interesting website that has nothing to do with government transparency but is about good use of government data. The USDA maintains a <a href="http://www.ars.usda.gov/main/site_main.htm?modecode=12354500">big database of nutrition facts about foods</a>. You can download the database and build applications based on it, like a menu planner. This is something I&#8217;ve been thinking about in the back of my head for a while since after getting into the whole <a href="http://www.michaelpollan.com/">Michael Pollan food mind-set</a> I&#8217;ve wondered whether one can make a healthy diet just by balancing various food groups (as I try to do with limited success), or whether (contra Pollan&#8217;s overall message, though maybe not in the details) it would be useful to start adding up the numbers of various nutrients to see how my meals match up with recommended values. How should I know, for instance, if I&#8217;ve managed to exclude an important vitamin in my particular selection of foods that I eat week after week, right?</p>
<p>The database is great itself, but the cooler website is <a href="http://www.mypyramidtracker.gov/planner/">MyPyramid Menu Planner (mypyramidtracker.gov)</a> (also out of the USDA). You can enter a typical daily roster of what you eat (with a nice sound effect) and it will tell you how it stacks up for a recommended diet for your age (or for me, how to gain weight to a recommended amount for my age). It feels a little over-simplified, but the simplicity keeps me on the site. I find, not surprisingly, that I probably eat about half of the recommended calories and clearly not enough grain or fruit. Well, I knew this in the abstract, but quantifying it helps direct me to fixing the problem.</p>
<p>I&#8217;m sure there are other websites that do similar things, but it&#8217;s nice to find a case where the government has both published a comprehensive (well structured, well documented) database and has also built a really nice interface for the data. And on a topic that is really very important to daily life, too.</p>
<p>And with that, I think I will take the rest of the weekend off from civics!</p>
]]></content:encoded>
			<wfw:commentRss>http://razor.occams.info/blog/2008/07/04/eating-well-on-independence-day/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Communicating with Congress: Recommendations for Improving the Democratic Dialogue</title>
		<link>http://razor.occams.info/blog/2008/06/21/communicating-with-congress-recommendations-for-improving-the-democratic-dialogue/</link>
		<comments>http://razor.occams.info/blog/2008/06/21/communicating-with-congress-recommendations-for-improving-the-democratic-dialogue/#comments</comments>
		<pubDate>Sat, 21 Jun 2008 14:06:23 +0000</pubDate>
		<dc:creator>Joshua Tauberer</dc:creator>
		
		<category><![CDATA[Open House Project]]></category>

		<guid isPermaLink="false">http://www.theopenhouseproject.com/?p=367</guid>
		<description><![CDATA[<p>CMF published an interim report <a href="http://www.cmfweb.org/index.php?option=com_content&#038;task=view&#038;id=256">Communicating with Congress: Recommendations for Improving the Democratic Dialogue </a>. I had one of those &#8220;someone got it right&#8221; moments reading the report. Following what seemed to be tireless work by Daniel Bennett and Rob Pierson (<a href="http://honda.house.gov/">Rep. Mike Honda</a>&#8217;s office) and CMF staff going back a long time, and a conference in October that I really enjoyed, they recommend adding metadata to constituent communication to reliably indicate who the sender is, what the issue is, and what advocacy organization helped the sender send the message.</p>
<p>The recommendation serves to help congressional staff manage incoming communication. It&#8217;s a method of triage on the one hand, and a tool to help tally communications by position on the other. Critical as this may be, I find tallying to be incredibly superficial &#8212; and it really reveals, I think, that the world of communicating with Congress has become extremely narrow. (But I&#8217;ve written on that before.)</p>
]]></description>
			<content:encoded><![CDATA[<p>CMF published an interim report <a href="http://www.cmfweb.org/index.php?option=com_content&#038;task=view&#038;id=256">Communicating with Congress: Recommendations for Improving the Democratic Dialogue </a>. I had one of those &#8220;someone got it right&#8221; moments reading the report. Following what seemed to be tireless work by Daniel Bennett and Rob Pierson (<a href="http://honda.house.gov/">Rep. Mike Honda</a>&#8217;s office) and CMF staff going back a long time, and a conference in October that I really enjoyed, they recommend adding metadata to constituent communication to reliably indicate who the sender is, what the issue is, and what advocacy organization helped the sender send the message.</p>
<p>The recommendation serves to help congressional staff manage incoming communication. It&#8217;s a method of triage on the one hand, and a tool to help tally communications by position on the other. Critical as this may be, I find tallying to be incredibly superficial &#8212; and it really reveals, I think, that the world of communicating with Congress has become extremely narrow. (But I&#8217;ve written on that before.)</p>
]]></content:encoded>
			<wfw:commentRss>http://razor.occams.info/blog/2008/06/21/communicating-with-congress-recommendations-for-improving-the-democratic-dialogue/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Webcontent.gov updates publishing-data recommendations</title>
		<link>http://razor.occams.info/blog/2008/06/12/webcontentgov-updates-publishing-data-recommendations/</link>
		<comments>http://razor.occams.info/blog/2008/06/12/webcontentgov-updates-publishing-data-recommendations/#comments</comments>
		<pubDate>Thu, 12 Jun 2008 17:25:30 +0000</pubDate>
		<dc:creator>Joshua Tauberer</dc:creator>
		
		<category><![CDATA[Open House Project]]></category>

		<guid isPermaLink="false">http://www.theopenhouseproject.com/?p=360</guid>
		<description><![CDATA[<p>I was very lucky this week to have stumbled into the middle of an update being done to a page maintained by the U.S.&#8217;s GSA at <a href="http://webcontent.gov">webcontent.gov</a> on best practices for making data available, for executive branch agencies. The site serves as a collection of best practices and uses OMB policies<br />
as a starting point. I think it had been last updated in 2005.</p>
<p>The page updated is <a href="http://www.usa.gov/webcontent/usability/accessibility/access_to_data.shtml">here</a>.</p>
<p>The updates were a combination of suggestions from Scott Horvath and Jeremy Fee at the USGS, Kol Peterson from EPA, and me, and really big thanks go to Scott and Kol for reaching out to others for input on Monday and getting the feedback back to Bev Godwin at GSA who runs webcontent.gov who published the changes only a few days later. Scott also notes that additional suggestions could still be considered (his email address is at the bottom of that page).</p>
<p>In making my suggestions, I turned to the <a href="http://www.opengovdata.org">Open Government Data Principles</a> and tried to squeeze in as much as I could without overloading the document, and I drew from ideas that came up in the preparation of the Open House Project report. Some of the changes made were:</p>
<ul>
<li>It now provides examples of data as being documents, audio/visual recordings, and databases.</li>
<li>It now says to support &#8220;the widest practical range of public uses of<br />
the data&#8221;. It had formerly suggested supporting the &#8220;intended&#8221; use of<br />
the website by visitors.</li>
<li>It notes the benefit of providing data: &#8220;New uses of your agency&#8217;s<br />
data may become a valuable public resource that would be out of the<br />
scope of your own website, such as helping to keep the public informed<br />
about the work of your agency and supporting civic education and<br />
participation.&#8221;</li>
<li>There is a new paragraph that I might be misunderstanding but which<br />
seems to make a suggestion along the lines of the recent &#8220;Invisible<br />
Hand&#8221; paper about the agency&#8217;s website getting the data the same way the<br />
public does: &#8220;Providing a uniform method to access raw data can also be<br />
the first step in internal development, accomplishing both goals at<br />
once. When a uniform method to access data is available, developers and<br />
webâ€“services can focus on data presentation.&#8221;</li>
<li>It notes that the availability of bulk downloads of data is something<br />
to consider when building data access.</li>
<li>It notes some disadvantages of using proprietary formats.</li>
<li>It recommends that if a proprietary format is needed, a<br />
non-proprietary format should be used in addition.</li>
<li> It adds a benchmark to test for success: &#8220;One benchmark for<br />
determining whether data is made sufficiently available is whether the<br />
public has all of the data needed to replicate any searching, sorting,<br />
and display functionality provided on the agency&#8217;s own website.&#8221;</li>
<li>It notes that consulting the public in the development of data access<br />
seems to be entailed from OMB policy: &#8220;When choosing data formats and<br />
distribution methods, keep in mind that your agency&#8217;s visitors are the<br />
best judges of their own needs. Agencies must &#8220;establish and maintain<br />
communications with members of the public and with State and local<br />
governments to ensure your agency creates information dissemination<br />
products meeting their respective needs&#8221; (OMB Policies for Federal<br />
Public Websites #4A).&#8221;</li>
</ul>
<p>We have a real success story here.</p>
]]></description>
			<content:encoded><![CDATA[<p>I was very lucky this week to have stumbled into the middle of an update being done to a page maintained by the U.S.&#8217;s GSA at <a href="http://webcontent.gov">webcontent.gov</a> on best practices for making data available, for executive branch agencies. The site serves as a collection of best practices and uses OMB policies<br />
as a starting point. I think it had been last updated in 2005.</p>
<p>The page updated is <a href="http://www.usa.gov/webcontent/usability/accessibility/access_to_data.shtml">here</a>.</p>
<p>The updates were a combination of suggestions from Scott Horvath and Jeremy Fee at the USGS, Kol Peterson from EPA, and me, and really big thanks go to Scott and Kol for reaching out to others for input on Monday and getting the feedback back to Bev Godwin at GSA who runs webcontent.gov who published the changes only a few days later. Scott also notes that additional suggestions could still be considered (his email address is at the bottom of that page).</p>
<p>In making my suggestions, I turned to the <a href="http://www.opengovdata.org">Open Government Data Principles</a> and tried to squeeze in as much as I could without overloading the document, and I drew from ideas that came up in the preparation of the Open House Project report. Some of the changes made were:</p>
<ul>
<li>It now provides examples of data as being documents, audio/visual recordings, and databases.</li>
<li>It now says to support &#8220;the widest practical range of public uses of<br />
the data&#8221;. It had formerly suggested supporting the &#8220;intended&#8221; use of<br />
the website by visitors.</li>
<li>It notes the benefit of providing data: &#8220;New uses of your agency&#8217;s<br />
data may become a valuable public resource that would be out of the<br />
scope of your own website, such as helping to keep the public informed<br />
about the work of your agency and supporting civic education and<br />
participation.&#8221;</li>
<li>There is a new paragraph that I might be misunderstanding but which<br />
seems to make a suggestion along the lines of the recent &#8220;Invisible<br />
Hand&#8221; paper about the agency&#8217;s website getting the data the same way the<br />
public does: &#8220;Providing a uniform method to access raw data can also be<br />
the first step in internal development, accomplishing both goals at<br />
once. When a uniform method to access data is available, developers and<br />
webâ€“services can focus on data presentation.&#8221;</li>
<li>It notes that the availability of bulk downloads of data is something<br />
to consider when building data access.</li>
<li>It notes some disadvantages of using proprietary formats.</li>
<li>It recommends that if a proprietary format is needed, a<br />
non-proprietary format should be used in addition.</li>
<li> It adds a benchmark to test for success: &#8220;One benchmark for<br />
determining whether data is made sufficiently available is whether the<br />
public has all of the data needed to replicate any searching, sorting,<br />
and display functionality provided on the agency&#8217;s own website.&#8221;</li>
<li>It notes that consulting the public in the development of data access<br />
seems to be entailed from OMB policy: &#8220;When choosing data formats and<br />
distribution methods, keep in mind that your agency&#8217;s visitors are the<br />
best judges of their own needs. Agencies must &#8220;establish and maintain<br />
communications with members of the public and with State and local<br />
governments to ensure your agency creates information dissemination<br />
products meeting their respective needs&#8221; (OMB Policies for Federal<br />
Public Websites #4A).&#8221;</li>
</ul>
<p>We have a real success story here.</p>
]]></content:encoded>
			<wfw:commentRss>http://razor.occams.info/blog/2008/06/12/webcontentgov-updates-publishing-data-recommendations/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Government Data and the Invisible Hand</title>
		<link>http://razor.occams.info/blog/2008/06/06/government-data-and-the-invisible-hand/</link>
		<comments>http://razor.occams.info/blog/2008/06/06/government-data-and-the-invisible-hand/#comments</comments>
		<pubDate>Fri, 06 Jun 2008 10:58:44 +0000</pubDate>
		<dc:creator>Joshua Tauberer</dc:creator>
		
		<category><![CDATA[Open House Project]]></category>

		<guid isPermaLink="false">http://www.theopenhouseproject.com/?p=352</guid>
		<description><![CDATA[<p>The guys over at Princeton&#8217;s new Center for Information Technology Policy wrote a really great <a href="http://ssrn.com/abstract=1138083">paper</a> for the Yale Journal of Law &#038; Technology on the role data should have, compared to websites, in government. It articulates a point that I think many of us subconsciously have had in mind:</p>
<blockquote><p>&#8220;The new administration should specify that the federal governmentâ€™s primary objective as an online publisher is to provide data that is easy for others to reuse, rather than to help citizens use the data in one particular way or another.&#8221;</p>
</blockquote>
<p>And they suggest an interesting way to push that forward:</p>
<blockquote><p>
&#8220;The policy route to realizing this principle is to require that federal government websites retrieve the underlying data using the same infrastructure that they have made available to the public. Such a rule incentivizes government bodies to keep this infrastructure in good working order, and ensures that private parties will have no less an opportunity to use public data than the government itself does. The rule prevents the situation, sadly typical of government websites today, in which governmental interest in presenting data in a particular fashion distracts from, and thereby impedes, the provision of data to users for their own purposes.&#8221;</p>
</blockquote>
<p>I think this is a worthwhile addition to the <a href="http://www.opengovdata.org">opengovdata</a> and <a href="http://www.publicmarkup.org">publicmarkup.org</a> policy documents &#8212; if not as a direct recommendation (because I think it may be too much to ask for in a grand form) then noted as a long-term goal or (in terms of the second paragraph I quoted) as a benchmark, a concrete way to tell whether data is open.</p>
<p>The full citation is: Robinson, David, Yu, Harlan, Zeller, William P and Felten, Edward W, &#8220;<a href="http://ssrn.com/abstract=1138083">Government Data and the Invisible Hand</a>&#8221; (2008). Yale Journal of Law &#038; Technology, Vol. 11, 2008</p>
]]></description>
			<content:encoded><![CDATA[<p>The guys over at Princeton&#8217;s new Center for Information Technology Policy wrote a really great <a href="http://ssrn.com/abstract=1138083">paper</a> for the Yale Journal of Law &#038; Technology on the role data should have, compared to websites, in government. It articulates a point that I think many of us subconsciously have had in mind:</p>
<blockquote><p>&#8220;The new administration should specify that the federal governmentâ€™s primary objective as an online publisher is to provide data that is easy for others to reuse, rather than to help citizens use the data in one particular way or another.&#8221;</p>
</blockquote>
<p>And they suggest an interesting way to push that forward:</p>
<blockquote><p>
&#8220;The policy route to realizing this principle is to require that federal government websites retrieve the underlying data using the same infrastructure that they have made available to the public. Such a rule incentivizes government bodies to keep this infrastructure in good working order, and ensures that private parties will have no less an opportunity to use public data than the government itself does. The rule prevents the situation, sadly typical of government websites today, in which governmental interest in presenting data in a particular fashion distracts from, and thereby impedes, the provision of data to users for their own purposes.&#8221;</p>
</blockquote>
<p>I think this is a worthwhile addition to the <a href="http://www.opengovdata.org">opengovdata</a> and <a href="http://www.publicmarkup.org">publicmarkup.org</a> policy documents &#8212; if not as a direct recommendation (because I think it may be too much to ask for in a grand form) then noted as a long-term goal or (in terms of the second paragraph I quoted) as a benchmark, a concrete way to tell whether data is open.</p>
<p>The full citation is: Robinson, David, Yu, Harlan, Zeller, William P and Felten, Edward W, &#8220;<a href="http://ssrn.com/abstract=1138083">Government Data and the Invisible Hand</a>&#8221; (2008). Yale Journal of Law &#038; Technology, Vol. 11, 2008</p>
]]></content:encoded>
			<wfw:commentRss>http://razor.occams.info/blog/2008/06/06/government-data-and-the-invisible-hand/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
