Building a site that scales

When it comes to website development it’s usually true that, ‘if you build it, they will come’. But what happens when there’s more than expected?

In today’s digital world, it’s imperative that your website can scale in-line with your company’s growth to meet the demands of your website traffic, no matter what the spike might be. Unfortunately most business owners don’t think about this until after their website goes down.

Up or out?

There are two main approaches to increasing capacity – scaling up and out. Up means you build for a single server (which is easy) and just buy a bigger one when it reaches its limit. The only problem with this approach is that sadly a server that’s x10 as fast won’t cost you x10 as much… expect nearer x100. You also need to think about your peak traffic times, for example Black Friday, during the holidays or after a big ad has aired. The crucial times of the year when you do x10 your normal volumes and the rest of the year that server that cost as much as an Aston is barely idling along.

So scaling out is the smarter way. With this approach you add another server and you nearly double capacity. Just keep adding them and your capacity and, crucially costs, scale linearly. This type of architecture is very robust, can handle a great deal of traffic and should be very reliable since there is no single point of failure. Sounds awesome doesn’t it? And it is – especially if you do this in the cloud with someone like AWS or Azure because you can rent a server by the hour for those peak times.

But it’s never that easy. It’s never that easy because with many smaller servers, they are more likely to break so you need to plan for that. And I mean really plan, not just that DR document that’s never going to be needed because that giant server has so much redundancy in it, it can’t break (until the cleaner trips the power… AGAIN!). But building to tolerate things constantly breaking and new servers joining and leaving is a totally different set of engineering challenges.

And at some point you need to bring everything together into a database and that becomes your single point of failure. So you need to look at scalable databases which is something very different again.

It’s all doable with the right skills but it’s not easy. Especially for ‘traditional’ developers in ‘traditional’ businesses. That’s why the big internet retailers like Amazon cope with Black Friday with easy, whereas the traditional ones like Currys and Tesco struggle. They are built on big, single server architectures which don’t scale and unravelling that tech set and skill set doesn’t happen overnight.

And while we’re thinking of Black Friday that more malevolent peak caused by a DOS attack is harder to plan for and guard against. Elastic scaling may not easily get you out of that one either so you need to think of way to identify rogue traffic and divert it but if it’s saturating your connections then you’re already in trouble so putting that at your borders it’s vital. You need friendly ISPs or someone who knows this well.

So what to do?

  • Employ people with scaling skills – people who are pragmatic and can build software to expect failure and recover from it.
  • Select scalable infrastructure and that’s rarely something that’s in house
  • High-level caching
  • Load balancing
  • Shift stuff to the cloud – just because it’s under your desk doesn’t make it safe or less scary!
  • Build for scale as soon as you can. Use cheap operating systems (e.g. linux), scalable database (e.g. Cassandra, Elasticsearch, DynamoDB), content delivery networks for as much as possible and select a scalable infrastructure (E.g. AWS, Azure).

And finally, continuously monitor everything. Never fly blind. Use the next generation of monitoring tools (like the ones we’re currently building!) to make sure you’re always there for your customers and always aware of what’s going on. This will enable you to intervene the very moment that users begin to experience any difficulties.

What has been your experience? Has your website gone down due to a unexpected traffic spike? Let us know below.