As a side-effect of preparing to investigate the relationship between population density and location value, I’ve produced a map of population density in New Zealand according to the 2006 census: (Click on the map for a slightly larger version; likewise for all the others below.)
Before I get on to explaining this, and what I intend to do with it, I’d better give some credits:
- The boundaries between areas are Statistics New Zealand’s area unit boundaries for 2010.
- I used Land Information New Zealand data about lakes and islands to help determine which area units were meant to be water, and which were meant to be land. In particular, I calculated the proportion of each area unit that was water.
- And, of course, I used Statistics New Zealand’s census counts from 2006 for the populations of each area unit.
Just to dot every i and cross every t (even though I think I’ve given more attribution than they asked for), Statistics New Zealand wants me to say this:
That goes for all the maps below, too.
The LINZ data is licensed under exactly the same licence, and they want me to say:
Contains data sourced from Land Information New Zealand under CC-By.
Though perhaps I’m only required to do that for the maps below that actually show their lakes.
Anyway, you’ve probably guessed that the lighter green areas are the less densely populated ones, and the darker areas (all the way to black) are the more densely populated ones. The areas are shaded by percentiles of population density, rather than being darker in proportion to population density; I think you can see a little more detail that way.
I calculated the population density as the census count divided by the area of land in each area unit, rather than the total area of the area unit. That makes a significant difference to Takapuna Central, for example:
Now, the reason I wanted all this was to divide the country up into regions of roughly equal area. I wanted to do it in a natural way according to population density, so that each region would capture the land whose value is most influenced by the population centres in that region.
The “roughly equal area” I’ve chosen for now is, somewhat arbitrarily, 177 square km. It’s too small for New Zealand’s largest cities, but I’m hoping to get good data from the more numerous smaller towns and cities in New Zealand.
So I wrote a program that did roughly this: Each time it wanted to make a new region, it chose the most densely populated area unit that wasn’t already allocated to a region. Then, it successively added not-already-allocated neighbouring area units to the region in a way that kept the region convexish; if any neighbouring area unit was entirely within the convex hull of the region-so-far, the most densely populated such area unit would be added to the region; otherwise the population density of each eligible neighbour would be divided by the proportion of that area unit that lay outside the convex hull, and the one with the highest result would be added to the region. This continued until the target area was reached or until the accumulated region had no more neighbours not already allocated to other regions.
This process isn’t without problems. For example, it was perhaps too strict about which area units neighboured which other ones. To illustrate: The region I’ve called Central Auckland grew to include parts of the old Waitakere City — not enough to include Te Atatu Peninsiula, but just enough to isolate the peninsula from the rest of Waitakere. So when the algorithm was building the region I’ve called North Waitakere, the three area units on Te Atatu Peninsula were never eligible to be chosen, even though, looking at this picture, it might have been natural for a person to stretch the definition of “neighbouring” area units, and include them.
And then, when the algorithm tried building a region on Te Atatu Peninsula, it quickly ran out of eligible neighbours to add, and had to abort that region before it had reached the target area.
Even if I did modify the algorithm to fix this problem, I’d probably still be tempted to exclude North Waitakere as an outlier from my later analysis, because it seems very likely that the location value in that region is heavily influenced by the proximity of the very densely populated Central Auckland region, rather than reflecting only the population density in the North Waitakere region itself.
Another problem is that area units are sometimes not sufficiently fine-grained. Many of the less densely populated ones are, on their own, much larger than the target 177 square km for each region. To cope with this, I got the algorithm to keep a record, when building each region, of the last area unit added — the one that pushed the region over 177 square km — and the proportion of it that was necessary to exactly reach 177 square km.
The intention is that I can scale down the data for the last area unit in each region, in an attempt to estimate the population and location value of only the 177 square km most closely associated with the population centre. I’m not sure exactly what assumptions I’ll use, but there will still be problems.
For example, some of the large, sparsely populated area units have little enclaves carved out of them for towns. Kahutara is a good example: So when the algorithm was building the Featherston region, it added Kahutara as the only possible second area unit, bringing the total area of that region well past the target area; Greytown and Martinborough were then isolated, and each became a too-small region of its own.
Now, how am I meant to estimate how much of the population and location value of Kahutara is influenced by Featherston (and therefore really belongs to that region), and how much is more influenced by the proximity of Greytown and Martinborough?
Carterton isn’t much better; presumably its population affects the location value of land to the Northwest of Carterton as much as it does land to the Southeast, but only the latter is included in the same region as Carterton.
Having said that, I think the algorithm has come up with reasonable regions in most cases, and excellent ones in some. For example, although the Invercargill region goes a little beyond 177 square km, half of the last area unit is required to reach that total, and I think this is probably a fair representation of the area whose location value is most influenced by Invercargill’s population density:
Next, I need to see what data about location value I can get from QV, figure out how to analyze it, and then see if these regions are adequate for the purpose, or whether the algorithm needs adjustment.