Dynamism and vanity in Data Management

The need for organisations to collect, store and maintain consistent and accurate data cannot be understated.

Though it would seem self-apparent that certain types of data are either correct or incorrect, accurate or inaccurate, in many cases a variation in data may be influenced by human perception or cultural and linguistic background, so that data referring to the same physical entity may be expressed in a number of ways, none of which are wrong.

This, along with common generic errors and complications caused by our changing world, make data collection and maintenance a continuous and challenging task.

I took a brief look at place names in my post about the Welsh town Betwys-y-Coed.

Case in point

Data collection is complicated by errors such as typos (Lodnon, Lomden), changes in casing (LONDON, London, london), the use of diacritical marks or their accepted non-diacritic equivalents (Köln, Koeln, Koln), abbreviation (St Petersburg, Sankt Petersburg), the inclusion or exclusion of punctuation (Caupenne-d’Armagnac, Caupenne d Armagnac, Caupenne Darmagnac) and the representation of the same place in different languages (The Hague, ‘s-Gravenhage, L’Aia). Furthermore, many places have more than one correct name: Brussels (Brussel/Bruxelles) is a good example of this.

You’re so vain

A further complication is the phenomenon of vanity addressing. Residents may perceive another postal code area, a neighbouring municipality, or a part of their home city as having more allure or a more relevant social standing. People may claim to live in Windsor instead of Slough, Camden instead of London or Charlottenburg instead of Berlin. Companies sometimes accommodate these addresses by storing a postally correct version of an address for internal processing and consistency purposes, whilst allowing the vanity aspects of an address to be maintained for certain communications with the customer.

We also have a tendency to overlook change when dealing with data. The customer has called, we’ve taken the order and popped their validated address into the system, after which we are happy to forget about it.  But data deteriorates. Not only do people change their names, addresses and telephone numbers; but the world in which we live changes too.  And this is as true of place names as for other data.

Changing Places

Place names get changed for political and cultural reasons. Recent examples include Leningrad to St Petersburg, Bombay to Mumbai and Pretoria to Tshwane. Where I now live, Bad Bentheim, was renamed from Bentheim in 1979 after the recognition of its status as a spa. Names may also change for linguistic reasons.  Peking became Beijing not as the result of renaming but due to an improvement of the transliteration system used to transform Chinese ideographs to Latin script.

Standardisation and accuracy improve your relationship to the customer and allow data processes such as search, matching, validation and de-duplication to work optimally. What place names illustrate, though, is the need to make conscious decisions to define the standards and definition of accuracy which are to be applied to the data being maintained, and to allow these definitions to evolve and develop over time as the world and the organisation’s needs change.