In search of hierarchies

I was recently asked about a conference registration system that a company was developing.  As is commonly the case, they had found that allowing attendees to add addresses into their system themselves, without validation or control, resulted in large amounts of unusable addresses. They wanted to create a system whereby the attendee chose a country then a city and from those two pieces of information a postal code could automatically be assigned.

That could work in a small number of countries, but a system based on a hierarchy like this one would need to work differently depending on the country chosen. In some countries a populated place can have thousands of postal codes (in The UK or The Netherlands, for example). In other countries a single postal code can cover multiple populated places, as in Germany. Or, indeed, a whole country or territory – Anguilla is a good example. In many other countries, including the one where the system was being developed, no postal code system exists at all.

Straddling cities

In any case, systems like this assume neat hierarchies within all addresses: country->region->place->postal code. In actual fact, neat hierarchies like this are very rare. Cities straddle the borders of regions (St Louis in the USA, for example), or even countries (Valga/Valka in Estonia/Latvia is one). Postal code regions (whose very purpose is quite different to boundaries drawn for administrative reasons) rarely coincide with administrative districts – why would they? Even in countries where a postal code system was designed to coincide with administrative district boundaries, such as in Italy, later alterations to both systems have diluted any hierarchy to the point of uselessness.

Other parts of addresses don’t follow neat hierarchies either. People might expect numbered buildings in named streets, whereas the buildings can be named and/or the streets numbered. Even administrative districts aren’t always nested one within another. Look at countries like Canada to see how each province or territory has a widely varying administrative structure. There are also large areas of the world which exist outside administrative district systems and which are governed directly by central government or a department thereof. These include large areas of sparsely populated countries, such as in Australia and Greenland, but also exist in countries where you might not expect to find them, such as Germany. Many of these areas are uninhabited. Others not only contain people, they contain many people – Washington DC, for example.

Looking at addresses and addressing as neat hierarchies which are constant within and between countries is a mistake. When creating systems to collect and manage address data the inconsistencies, idiosyncrasies and variations within and between countries need to be taken into account.  Any data capture system needs to dynamically adjust itself according to the country chosen, or a holistic approach to address validation, looking at the address as a string, can be used.

