Another AWS outage: Should you run for other clouds?

Dec 20, 2021 8 Minute Read

Heard any good AWS uptime jokes? As you might have heard that Amazon Web Services (AWS) has had some well-publicized issues in the past couple weeks.

First, an AWS outage on December 7 took down some important services in US-EAST-1, resulting in significant impact for many customers. Then, on December 15, another outage struck the US-WEST regions, albeit much less dramatic. At 7:14 a.m. Pacific time on December 15, some AWS customers began to notice internet connectivity issues in US-WEST-1 and US-WEST-2. Within about 45 minutes, AWS engineers had identified the root cause and began to implement remediation steps and by around 8:15 a.m. pacific time, they reported that things had returned to normal.

So, what happened? In this post, we’ll talk about the latest AWS outage, plus the rest of the news out around AWS this week. Let’s go!

Accelerate your career

Get started with ACG and transform your career with courses and real hands-on labs in AWS, Microsoft Azure, Google Cloud, and beyond.

Start a Free Trial

What happened with the second AWS outage this month?

Well, the offical status update says the issue was caused by “network conjestion between parts of the AWS backbone and a subset of Internet Service Providers.”

And what caused this network congestion?

Well, again, according to the official status update, it was “triggered by AWS traffic engineering, executed in response to congestion outside the AWS network. This moved more traffic onto the AWS network than expected” and subsequently affected connectivity between the AWS backbone network and a subset of internet destinations.

So, AWS was really trying to be proactive to do the right thing for customers, and, unfortunately in this case, it backfired. In fact, AWS engineers are always doing proactive stuff behind the scenes to keep services running efficiently, and we don’t even notice because things just work. This outage is much different in nature to the December 7 AWS outage, and I bet that not for that earlier event, this event would have gone relatively unnoticed.

What did we learn from the second AWS outage this month?

So, what are we to do? Run for the other clouds? Bring our data centers out of retirement? Look, everything fails all the time. Doesn’t matter if it’s on the cloud, across multiple clouds or in our own data centers.

According to AWS’s reports, it appears customers might have been impacted for as much as 45 minutes. 45 minutes in the context of a year’s worth of 24×7 service is still north of 99.99% uptime.

Of course, I can still hear the refrain “but that’s downtime, and we can’t afford downtime.”

Sure, then you should create an active-active multi-region failover architectures and pay twice what you’re paying now.

What’s that? You don’t trust AWS? Well, then create that same active-active multi-region architecture spanning multiple cloud providers with multiple vendor relationships, with multiple support contracts, and multiple support teams responsible for a solution that is now magnitudes more complex.

Now, how much are those 45 minute really worth?

See how to think like an SRE

Watch this free, on-demand webinar to see Alex Hidalgo, Director of Site Reliability Engineering at Nobl9, break down SRE culture and tooling.

Watch Now

New APAC Region in Jakarta

Aside from all the outage chaos, there was one announcement that was more interesting than the existing slew of “instance type X now available in region Y.”

AWS recently opened a new data center in the Asia Pacific region based in Jakarta, Indonesia. The new data center is named ap-southeast-3 and is the 10th AWS Region in the Asia Pacific and mainland China part of the globe.

In addition to this data center, AWS has also committed to growing their business in Indonesia and creating more than 24,000 jobs over the next 15 years. This is a great example of where AWS doesn’t just drop a data center somewhere, but truly invests in the local community, changing lives and improving economic and social futures.

Keep up with all things AWS

Want to keep up with all things AWS? Follow ACG on Twitter and Facebook, subscribe to A Cloud Guru on YouTube for weekly AWS updates, and join the conversation on Discord.

Looking to learn more about cloud and AWS? Check out our rotating line-up of free courses, which are updated every month. (There’s no credit card required!)

Another AWS outage: Should you run for other clouds?

Another AWS outage: Should you run for other clouds?

Accelerate your career

What happened with the second AWS outage this month?

What did we learn from the second AWS outage this month?

See how to think like an SRE

New APAC Region in Jakarta

Keep up with all things AWS

Recommend

Xlibe: an Xlib/X11 compatibility layer for Haiku

The container throttling problem

Corporate Sales at MongoDB: Meet the Reps

Error Codes And The Law Of Least Astonishment

华为云＆普华永道发布《车企上云之路白皮书》，基于华为云案例提出云转型建议

The web starts on page four

用科技助力体育、以智慧照亮冰雪，荣耀成为中国冰雪科技助力赞助商

年度总结在等你！从腕部到头部，可穿戴十年进化 | 深圳湾

FHIR Technology is Driving Healthcare's Digital Revolution

Solution to level 10 in Untrusted: http://alex.nisnevich.com/untrusted/

About Joyk