Why you’re much less redundant in the cloud than you think
Cloud suppliers love to tell you how dependable their services are. Redundancy is the keyword here and it’s important for reliability, you don’t want data loss or data corruption . But … In the cloud only a little bit of hardware redundancy is delivered by default, the rest is up to you. The services are available, but you need to buy them and configure them properly to keep the lifeline that your data is kept safe.
What are your real risks? And what measures can, no must you take to make sure that foreseeable and unforeseeable events don’t make your data unaccessible, unreliable or even have them vanish completely? For without your data your business grinds to a halt and if your data doesn’t return your business won’t get started again.
What are you not protected against by default?
The most common causes for data loss and data corruption are:
- Human error. This is the most common cause of data loss. Someone presses the wrong button, issues the wrong command, or just forgets to do something and your data is gone or unusable.
- Software errors. This is the runner up on the list of common causes of data loss and data corruption. No matter how well you’ve tested, all modern software is so complicated you can’t get it to be bug free.
- Configuration errors. When systems don’t communicate properly data can get lost. When systems don’t coordinate properly about who does what (and that is completely automatic more and more often) transactions can be executed twice, thrice, or not at all. We’ve seen that happen in investment and payment systems. I’m sure your personnel will be grateful if their salary is paid thrice, but that situation is best avoided.
- Hardware problems. Hardware that fails completely is a clear situation. But hardware that only fails a little bit is life threatening for your data. A customer of ours had a processor with a broken cache, the cache altered the data you put into it just a little bit. Resulting in a slowly corrupting data warehouse. By the time they realized what was going on the data was beyond repair.
- Malware and viruses. All manner of software threats are distributed daily. Some people enjoy breaking things, yours included. Just because they can.
- Power outages. Equipment is protected against this much better than it used to be, but data that is not yet on stable storage can still be lost.
- Ransomware. When your files are encrypted by a ‘bad actor’, standard redundancy won’t help you. Every file they can access is locked down until you pay them.
- (Natural)disaster. If the area where your cloud data center is located gets flooded you lose everything. Lightning strikes, a terrorist attack or a crashing airplane has the same outcome.
Take inventory of your risks
You do have an information security policy? And you performed the risk inventory that goes with it?
Let’s be honest, not all data has the same level of importance. Some data you can afford to lose, even though it’s a nuisance and undoubtedly leads to some hassle. But some data is of vital importance to your business. You want that to be 100% safe. Fortunately the volume of that data is limited. Which leaves: all the shades in between.
It starts by taking stock of the data you have. Than you categorize the into the 3 described categories, in other words you determine what the damage would be if that data is lost or corrupted. And finally you take measures proportional to the damage you incur if that data would be lost or gets corrupted.
Make your IT truly redundant
You can’t protect yourself from everything, but you can protect yourself from most things. Whether your completely cloud based or not, these ar the the most important measures to take:
- Make back-ups. That really doesn’t change when you move to the cloud. You can use services provide by your cloud provider that are easy to integrate. This protects against the most common causes of data loss and data corruption. You do need to test your back-ups.
- Make offline back-ups. Keeping part of your back-up offline is the best remedy against malware, viruses, ransomware and (natural) disasters. You do need to make sure a recent backup is part of the set for this to be effective. It’s fine if you keep the same back-up both online and offline.
- Use snapshots. Snapshots allow yopu to move back to a point in time. For example an hour before ransomware encrypted your data. Snapshots allow you to get critical parts of your IT up and running again much faster than restoring a back-up. The frequency at which you make snapshots determines the maximum amount of data you could lose.
- Use clusters. Make sure critical servers have multiple instances. This way, if one server fails you can still access your data. But also so the demise of that one server doesn’t directly cause data loss or data damage.
- Store critical data in more than one storage location. If different servers in a cluster keep their data on a physically different storage location you reduce the chance of data loss even further.
- Use the entire availability-zone. Availability-zones are groups of data centers located in the same region. The exact layout depends on the cloud provider you use. By spreading the servers of a cluster across different data centers in the availability zone you reduce the chance that data center loss leads to data loss. You do need to make sure the chosen data centers can’t be compromised by the same event: they need to be at a sufficient distance from each other and not be located in the same river valley. Network traffic required for redundancy in the same availability zone is usually free.
- Spread data over multiple availability zones. Most cloud suppliers have several availability zones. By spreading your data across more than one availability zone the data gets a wider geographical distribution and the chances of loss become even smaller. You do need to account for a higher latency, caused by the greater physical distance. This can cause bigger delays. The network traffic required for redundancy between availability zones is not always free.
Find help
Goed redundancy requires specialized knowledge. Knowledge not always available in your organization. Do you want to have a chat about making your IT redundant, without losing responsiveness and speed? Make an appointment with me, no strings attached, and we’ll take stock of what’s needed/possible in your situation. I guarantee I’ll have at least one tip for you to improve your redundancy.