Outage outrage! Why public cloud is putting data and business at risk

Cloud outage

Key Takeaways

Recent reports indicate that 60 outages occurred across major cloud providers, with Google Cloud experiencing 43 of them, highlighting the vulnerability of cloud services for mission-critical applications.

Organizations must rethink their cloud workload management strategies, considering hybrid environments and the need for automated data transfer to mitigate the risks associated with relying solely on public cloud services.

The shared responsibility model for data security remains unclear for many users, emphasizing the importance of organizations understanding their obligations in data backup and architecture resilience to prevent loss during outages.

In the last 30 odd days there have been 60 cloud outages across four of the major cloud providers in AWS, Microsoft Azure, Google Cloud and Oracle Fusion Cloud. According to cloud and SaaS monitoring firm IsDown, 43 of these outages were for Google Cloud alone, which, along with the other hyperscalers, was contacted for comment on the matter for this article.

The current outage problem as it stands is an indication of the daily challenge facing the cloud providers under not just the pressure of increased numbers of users, but also the risk of using public cloud for anything mission critical or sensitive.

The trouble is that outages are a fact of life. As Tom Fairbairn, distinguished engineer at middleware firm Solace reminds us, “stuff happens” and most tech-based firms and platforms go down at some point. He talks about Facebook’s major outage in 2021, and how hardware failure and human error are generally to blame. But surely this is different – this is the cloud we’re talking about, the bedrock of modern computing.

“A lot of applications are now being built on the cloud,” says Stewart Parkin, CTO EMEA at Assured Data Protection. “Very often SaaS organizations are built on AWS or Azure, so it’s very possible that an entire organization moves into the cloud. Azure then goes down, and all of these workloads – email, Salesforce ERP files, SQL databases – are lost within that, due to their complete reliance upon SaaS on the cloud.”

If this is ringing any bells for you, then it is perhaps time to think again about how your business manages its workloads. Dr Thomas King, CTO at DE-CIX, a global operator of large carrier and data center neutral internet exchanges says in this age of increased hybrid cloud environments, having a mechanism to manage the data automatically across environments is key. He suggests using a cloud routing service over a cloud exchange platform, so the data can be transferred between the company’s multiple cloud services without it needing to travel back to the private company network or infrastructure first.

“Companies have a need for speed and with the low latency this provides, it can almost feel like one single cloud environment,” he says.

Given the rapid rise of the co-location datacentre market, at least according to one report, this makes a lot of sense, as more organizations look to have a private cloud provision without all the pain of running their own datacentre. Great if you can do it, but what if a business has to rely on public cloud services and applications?

Matthew Hodgson, co-founder of Element, a decentralized and secure messaging app used by the UK government and NATO, says that cloud-based platforms are actually putting data at risk. He references recent Microsoft Teams outages and suggests that too many organizations “put their eggs in one basket.” The recent outages, he says, are forcing businesses to look for additional solutions, where mission critical needs are not going to be jeopardized by outages.

And then there is the shared responsibility data issue. As Mark Molyneux, CTO EMEA at Cohesity explains, this is far from clear for most users. Cloud providers, he says, can be chosen to be custodians of an organization’s data. The responsibility though always lies with the organization to ensure they’ve aligned their risk to their cloud deployments.

“As a CTO I am often amazed at the lack of understanding that people have of their shared responsibility model in everything from backing up, through to maintaining a redundant resilient architecture, through to patching,” says Molyneux. “A classic example is Office 365, which explicitly states that consumers of the service should provision their own backup and restoration capability atop the service being provided by Microsoft.”

DNS SOS!

Data security has long been a criticism of public cloud and this has only deepened in recent weeks. Andy Jenkinson, group CEO at Cybersec Innovation Partners and author of Digital Blood on their Hands: The Ukraine Cyberwar attacks, points to increasing DNS security failings and risks in the cloud.

“The weaknesses are where the joints are,” says Jenkinson. “DNS attacks are impacting companies at ever increasing and significant rates and cloud computing increases the exposure.”

Jenkinson cites two reports from January 2022 – a White House paper on the implementation of Zero Trust Architecture (ZTA) for the US Federal Government and a European Commission paper on DNS Abuse. He says the combined papers, published five days apart, cited DNS over 1000 times.

“It is no coincidence that these papers were issued within days of Russia’s cyberwar offensive upon Ukraine,” adds Jenkinson. “The cyberattacks exploited exposed insecure vulnerabilities in over 70 Ukraine government websites including DNS and PKI issues.”

Jenkinson says that technology is often about convenience and the cloud fits into that category. As cloud usage grows, the assumption is vulnerabilities multiply and outages become more common. But stats from one report, admittedly conducted by a hybrid multi-cloud managed service provider called Aptum, reveals a still complex picture of multi-cloud and legacy environments, potentially open to DNS security failings.

The report hints at some repatriation of cloud services to an on-premise or co-location datacentre, the movement of workloads prompting an uptick in the use of on-premises and datacentre infrastructure in the next two years. It also paints a picture of uncertainty, such is the complexity of multi-cloud environments and data risk. A lack of skills and legacy applications were cited as key factors in causing some market confusion.

This is where public cloud is soaking up customers, almost by default, as one report reveals, piling more pressure on providers and opening them up to more potential outages. This is much to the annoyance of Mark Boost, CEO of cloud native service provider Civo.

“Time and time again, we’ve seen the major cloud players fail to deliver for their users,” says Boost. “The outages of recent months have shown users that hyperscalers are unreliable partners.”

Boost’s beef is that he thinks the hyperscalers are managing additional complexities simply due to their scale, which puts them at an increased risk not just of outages but also cyber threats. He suggests it’s an opportunity for other, smaller providers to step-in. He tells ERP Today about how the cost of public cloud has jumped by 66 percent on average when compared with previous years (Civo’s own figures) and how emerging providers can look to “bridge the gap.”

He has a point, but the hyperscalers are hyperscalers for a reason. You don’t get fired for buying AWS or Azure or Oracle or Google, regardless of their outages.

In other words, the hyperscalers are here to stay. So what are you going to do about it?