For as long as I can remember, relational databases have ruled the data roost. Relational models first appeared with the microprocessor era in the 70s thanks largely to the research of computer scientists such as Ted Codd.
SQL databases are incredibly mature and good at what they do. They provide absolute consistency and certainty of result and are easy for developers to understand. But whilst hardware has made astonishing leaps since the 70s, the demands of web-scale businesses has outstripped its ability.
The big boys, like Google and Amazon have discovered the hard way that you can only push it so far and ultimately you need a different approach. The problem isn’t just the amount of data and how fast you need to access it but also how unstructured it is.
SQL is designed for a single (often very big and expensive) server. That’s fine but as your demands grow so does the price of the hardware to make it work but not at the same rate. A server with 10 times the capacity will cost you a lot more than 10 individual ones. It’ll probably cost 100 times that and it’s just 1 box. Older businesses could apply upgrades overnight but in a 24×7 web world there is no night.
Now we’re not trying to say that relational databases are completely rubbish and should never be used – in fact quite the contrary. There are millions of enterprise businesses, particularly banks, that serve thousands of branches and millions of customers, survive just fine using SQL. However the way data is being used by enterprise customers is changing – and so the databases that serve them must change too.
So we need something new, something that’s more than SQL – NoSQL. These are usually non-relational databases such as MongoDB, Cassandra and CouchDB. NoSQL databases scale up horizontally, adding more servers to deal with larger loads. SQL databases, on the other hand, usually scale up vertically, adding more and more traffic to a single server. Auto-sharding enables NoSQL to automatically share data across servers, providing a more robust system in the event of a server crash. Don’t forget relational models were built for a time long time before the advantages of cheap storage and processing power were made available.
SQL gives us total consistency but it’s usually achieved by things called locks. Ultimately these force things to be done one at a time and slow things down. Again the NoSQL approach is different. It takes a more pragmatic approach that most of the time, this level of consistency isn’t needed and where it is, there are smarter ways of achieving it.
Most NoSQL databases are also open-source, meaning that they can be downloaded, implemented and scaled at little cost. They are faster and scale better – but do so at the expense of consistency. No database is without its flaws I’m afraid.
In a big data world – NoSQL is the answer to storing and serving unstructured or semi-structured data, which aren’t really suitable for relational databases.
So making the choice between SQL and NoSQL really depends on your needs. It’s not an easy transition and can be a steep learning curve but for the bold it’s pretty cool!