The NHS describes Compulsive Hoarding as “excessively acquiring items that appear of little or no value and not being able to throw them away, resulting in unmanageable amounts of clutter.” When it comes to data, there are an infinite number of things which can be captured about an individual’s daily interactions with the world around them. All of the data can provide us with huge insight into an individual or collectives habits which is priceless.
An example of the scale of big data is TomTom’s ever-expanding historical traffic database which now contains over 9 trillion data points. These data points are captured from TomTom navigation devices in their customer’s cars. From this database they can pull drive times and average speeds for any stretch of road across Europe and North America. When TomTom decided that they would start to capture this data there was obviously a clear strategy and purpose behind it. But does data have to be captured and stored with a clear purpose in mind or if it is being generated as part of a bigger project is it worth hoarding just in case?
Here at Postcode Anywhere our core business has always been the traditional address lookup technology. With over 10,000 active clients we have and continue to process millions of anonymous transactions on a daily basis. For the purposes of reporting to our various data providers we are obliged to store details of these requests. To give us a clear understanding of our customers each of them has been tagged as working within a particular industry sector. This means that we can not only see which addresses were looked up when but also which sectors they fall into. Using technologies such as ElasticSearch we can now mine this data to see trends and patterns which are not only invaluable for internal purposes but also for our clients. We can even overlay third party datasets such as the weather, seasonal events and demographics to see how these can bend the trends. The graph below shows our traffic from our retail clients during December 2013.
What’s the difference between a goldmine and unmanageable clutter?
One of the phrases which regularly comes up in our industry is the idea of ‘Garbage in Garbage Out’. If the data you are capturing is erroneous, inconsistent or incomplete then all you will get out from any interrogation of it will prove to be worthless. Luckily at Postcode Anywhere we have put effort into ensuring all the data we capture is accurate, consistent and complete. This means that even though when we captured it we had no idea that we may one day use it for our ‘Big Data Labs’ project, it does exactly what we need.
What does all this mean? If you put in place techniques and rules to ensure that the data you are capturing is accurate then it doesn’t matter if it appears to have little or no value. As long as you have the capacity to store it then go ahead and become a hoarder as one day you may find that you are sitting on your own data goldmine.