Everyone likes a good choropleth map, that is, a map with regions colored according to some variable. But when the variable is a function a population of highly unevenly distributed individuals — such as in maps of the United States — we know we can run into some problems:
Half of the population in the continental United States lives in just 1% of the continental U.S. land area. One-fifth of the population lives in just 0.28% of the land area. 95% of the population lives in just 27% of the land area.
There are at least two problems with veridical (regular geographic) choropleth maps. In a rasterized choropleth map (i.e. it has finite resolution), entire cities can get squashed into a single pixel with the result that information is lost. A substantial proportion of the information that the map is trying to show probably doesn’t even appear if the variation occurs where the people are.
And more familiar, veridical maps can misrepresent the aggregate. Individuals in low-density areas are given more space on the map than individuals with high-density areas, biasing aggregate inferences toward the value of individuals in low-density areas.
Coloring a map by district — like by county or congressional district — runs into the same problem. The smallest 50% of the 433 congressional districts in the continental U.S. occupy just 5% of the land area. Six congressional districts, all in New York City, are smaller than one pixel in a typically sized map! (Where “typically sized” is 650px by 410px.)
When map data falls below the resolution of the map itself one should be very concerned. It’s like tossing out arbitrary data because these data points really aren’t showing up at all. That’s considered academic fraud when the data is shown in the form of a table. I’m not sure why we think it’s okay in map form.
It’s also mostly the urban population that gets squeezed into a small area. This is particularly concerning for politically themed maps since the urban population leans left. All six of those too-small-to-be-seen New York districts are currently represented by Democrats, for instance. Republican-held congressional districts are on average 2.7 times larger than Democrat-held districts despite having equal weight in Congress and so take up disproportionate space in a verdical map. The same is likely true by county too if we were to look at presidential election results.
Considering how much space on a map is taken up by essentially unpopulated land, these maps are also inefficient representations of the data. They give space to meaningless geographies while skipping meaningful ones.
It’s really time we stop using veridical maps to show population data. I get that cartograms are hard to construct and hard to read, but I would rather have no map at all than a map that misrepresents the data it purports to show.
Here’s a table showing land area as a function of population:
|% of Population||% of Land Area|
For computing land area resided in by the population, I used the 72,246 Census tracts in the 2010 census that make up the continental United States, meaning I excluded tracts in Alaska, Hawaii, and the five island territories. For land area I used the ALAND10 value in the Census’s shapefiles. The total population and land area of the tracts used were 306,675,006 and 7,653,005 km^2, respectively.
For congressional districts, I used the 433 districts in the continental U.S. (that’s the states minus Alaska and Hawaii and including the DC district). Their “land area” is their 2D area after being projected into EPSG:2163, which is an equal-area projection, using this Census GIS data. The total “land area” computed this way came out to 8,064,815 km^2, the difference being areas of water. For which party holds the district, I filled in the two currently vacant districts (AL-01 & FL-13) with the party of their most recent congressman.
Thanks to Matt Moehr, Lisa Wolfisch, and Pat Grady for some tips on identifying census tracts via Twitter.
Updates: Keith Ivey pointed out that I included Hawaii in the definition of continental U.S. the first time around. Fortunately it’s land area is small enough that it only barely affected the numbers. Instead of 74,003 tracts and 435 congressional districts there are 72,246 and 433; 60% of the population lives in 1.71% and not 1.70% of the land area. Other numbers are unchanged.
I also changed the projection used to compute the land area of congressional districts from “web Mercator” to an equal-area projection, but the numbers (e.g. 50% cover 5% of land) didn’t change. While I was there, I also changed how the Republican/Democrat distortion was measured. I originally wrote “Republican-held congressional districts cover 3.2 times more land area than Democrat-held districts despite Republicans only having 1.2 times as many seats in Congress” but I think the way it’s phrased now is clearer.