The potential for data to be corrupted and polluted increases as it gets passed through interfaces and contact points, and as it passes from process to process and from system to system. This makes data hard to keep clean. Those of us in the data quality world often hammer at the point that getting the data right at source is the ideal for a high level of data quality. But not all data is correct or standardised at source. When the originator of that data, as much prone to data quality defects as the rest of us, can’t decide on what form it takes, what chance for getting it right in your systems?
In many countries it is the local government body, or private developers, who name streets, not postal authorities. In Cambridge a road has been named after the 17th century Sir William Worts. Its name is Worts’ Causeway. But look at the signs that the local authority have erected and you might get a little confused, because you’ll also find signs for Wort’s Causeway and Worts Causeway. Without knowing the origin of the name (and having a good grounding in English grammar) few people would know which form is correct.
Changing names is another recipe for confusion. In the United Arab Emirates streets were mainly numbered, but locals had their own names for streets. The authorities are now naming streets, and signs are erected without the old numbers or local versions on them. Thus, 4th Street, known to many as Muroor and to others as East Street is now Sultan bin Zayed the First Street, not to be confused with the nearby Zayed bin Sultan Street, named after a different person. Whilst in the long run these changes will make life easier for business and local utilities, in the meantime local residents are having more problems finding their ways around.
Transliterating (transcribing text from one written script to another) will also always produce plenty of variety. There are only a limited number of languages where a standardised transliteration system exists, and those are not always applied in every country where that language is spoken. So should that be Tchaikovsky Street, Tschaikovsky Street, Tschaikowsky Street or Chaykovskiy Street? Somebody counted more than 40 versions of that composer’s name in the US Library of Congress alone.
Does your business know its shit?
It’s probably best not to get me started on apostrophes, and how important they are. But if anybody ever suggests that they’re not required, ask if their business knows its shit or knows it’s shit.
In the Cambridge street name example, Wort’s Causeway is clearly just wrong. Worts Causeway might also be an error, but there may be mitigating circumstances, as national guidelines have been issued suggesting that punctuation be removed from street names as it might cause confusion for emergency services. Whereas, presumably, having three different versions of a street’s name doesn’t?
Let’s be clear. If you think that you can bring clarity to your data just by removing punctuation marks, you’re wrong and very misguided.