Fwd: CTRL WHOOPS



-------- Original Message --------
Subject: CTRL WHOOPS
Date: 2025-10-22 16:57
From: "The Bleeding Edge" <feedback@e.brownstoneresearch.com>
To: <kenneth@hamer-hodges.us>
Reply-To: "The Bleeding Edge Feedback" <reply-fe8e11767065037e73-37250_HTML-656554164-518001038-7009@e.brownstoneresearch.com>

 

          _

CTRL WHOOPS

By Jeff Brown, Editor, The Bleeding Edge_

-------------------------

What happened with the internet?

This question plagued many of us on Monday, as we faced the frustrating
reality that our software applications and favorite websites had just
stopped working.

The drama began to unfold at 3:11 AM ET on Monday morning…

And the source was unexpected.

AWS US-East-1 Event Log | _Source: Amazon Web Services [1]_

The US-East-1 Region of Amazon Web Services (AWS) began to experience
error rates and network latencies, indicating something was seriously
wrong.

It only got worse from there…

CTRL WHOOPS

The root problem was a DNS resolution failure.

DNS is the domain name system – it’s like a directory for the
internet.

For example, when we type in www.x.com [2], the DNS knows exactly which
numerical IP address to go to.

DNS failures are often malicious. Many immediately assumed that this was
some kind of cyberattack, but this one wasn’t.

Ironically, it was caused by human error.

AWS was debugging its billing system. And to do so, it had to manually
shut down part of its storage system.

The engineer performing the work typed the wrong command and took
offline a much larger number of servers, some of which impacted the
Domain Name System for AWS’s US-East-1 Region. Whoops!!!

The whole situation is almost unbelievable. Such a simple mistake.

We’d think a task like this might be automated, or that it’d be
checked by an AI agent by now, before issuing the command. Or perhaps it
would require a strict process, or even a second set of eyes – to
ensure that the commands won’t cause any damage.

But nope, it was a few taps of the keyboard by an unsuspecting human
engineer dutifully doing their job… and it just about felt like the
whole internet went down.

More than 1,000 companies were impacted, and the US-East-1 Region of AWS
outage lasted somewhere between 10-12 hours.

While it’s too early to tell specifics, I’m guessing that there were
hundreds of millions in losses as a result.

Here’s what the typical experience was for a software company running
its services in the Amazon Web Services (AWS) cloud, based in the
US-East-1 Region.

The availability data below is for Atlassian (TEAM), a very widely used
project management software company.

Atlassian Availability on AWS During Outage | _Source: Cisco Thousand
Eyes_

Within the first hour of the problems at AWS, the availability of the
Atlassian software suite dropped to less than 50%, and then it
eventually dropped to 0%.

This applied to so many companies.

It applied to websites and cloud-based software services of companies
like Adobe, Apple Music, AT&T, Boost Mobile, Canva, ChatGPT, Chime,
Coinbase, Delta Airlines, Duolingo, Fanduel, GoDaddy, HBO Max, Hulu,
Instacart, Lyft, Microsoft (Outlook and Teams), Robinhood, Signal,
Slack, Shapchat, Square, Starbucks, T-Mobile, United Airlines, Venmo,
Verizon, Wall Street Journal, Xfinity (Comcast), Zillow, and Zoom, just
to mention a few.

And the gamers must have broken a few computer screens with Battlefield,
Fortnite, League of Legends, Rainbow Six Siege, Pokémon Go, and Xbox
– all stuck with messages like “_504. Gateway Timeout. Please wait a
few minutes and try again_.”

These are some of the most popular titles in gaming. That must have
hurt.

The AWS crash was so bad, it impacted AWS’s own support center.

Not only were the above companies unable to function…

They couldn’t even notify AWS about it.

The Weakness of Centralized Systems

Perhaps the most ironic impact of all was that Amazon’s own services,
specifically Alexa and Prime Video, also fell over during the outage.

I would argue it would have looked even worse if they hadn’t.

US-East-1 isn’t a place where something like this should happen.

The AWS data center campuses that make up this region are based in
Northern Virginia, in the greater Washington, D.C., area.

They are located in Data Center Alley, one of the largest data center
hubs in the world.

It’s also the location of AWS’s very first data center, built in
2006, and from there, it just got bigger and bigger.

I’ve been there myself and walked the streets around these data
centers. They’re massive, nondescript buildings with tight security
around them.

The US-East-1 Region is often a default for many companies to host their
websites and software services because of its importance and
reliability. Data Center Alley has the largest concentration of fiber
optic connections and data centers in the world. Nearly 70% of global
internet traffic flows through this critical hub.

AWS is an incredible service and an incredible business that’s enabled
the last two decades of innovation in software development – by making
computation and storage cheap and easy to use.

It’s an incredible business as well, which will generate about $126
billion in revenue for Amazon (AMZN) this year, by my estimate,
generating around 75% gross margins.

AWS is the key to Amazon’s free cash flow generation and also why its
total gross margins are around 50%.

That compares to probably around 20-25% gross margins for its e-commerce
business, which includes related advertising revenues.

Will businesses stop using AWS cloud services as a result of this
outage? That’s very unlikely.

Will Amazon reimburse its AWS customers for the damages caused? I think
that’s highly likely.

But this whole drama raises a major question…

Why could something like this happen… and how could it have been
avoided?

It is also a striking reminder of the weakness of centralized systems.

This is one of the key tenets of blockchain technology: Decentralization
for the purpose of network resilience.

With blockchain technology, if one node – like US-East-1 – goes
down, it won’t matter, because all of the other nodes are still
operational. The network keeps running.

Companies became complacent in recent years, hosting their services on a
single node while thinking that everything would be fine because it had
always been fine. That works, until it stops working…

When hosting a website or software service in the cloud, it is prudent
to have redundancy.

A company’s software service and/or website can be hosted in two or
more regions of AWS, so that if one region has an outage, everything
will fall back to the regions that are up and running.

Even better, multi-cloud software deployments provide even better
redundancy.

Why not host the software offering on AWS and Google Cloud, Oracle, or
Microsoft Azure?

Yes, there is some additional cost, but it’s an insurance policy
against mistakes like the one at AWS US-East-1.

It wasn’t all negative, though…

Recommended Links
-------------------------

          [3]

This “Alien Technology” Could Trigger New $100 Trillion AI Boom [3]

This crazy story [3] involves a former Apollo astronaut who walked on
the moon… An extraterrestrial substance that's 163x more expensive
than cocaine… And a new type of computer with god-like powers that,
according to Bank of America… “Could Be the Biggest Breakthrough
Since Fire.” If you have an open mind, click here to get the details
[3] because Jeff Brown believes this is about to make a lot of people in
America rich.
-------------------------

          [4]

Reagan's Tech Prophet Issues November 18th Warning [4]

George Gilder, a tech visionary, handed President Reagan the first
microchip that helped create $6.5 trillion in wealth over the last 40
years. Now he's stepping forward with an even bigger prediction about
what's being built in the Arizona desert. He believes 3 little-known
companies will explode when a bombshell announcement just days from now.
Smart investors are already positioning themselves. Click here to see
what's coming before the story goes mainstream. [4]
-------------------------

Some Wholesome Fun While the Internet Burns

Not surprisingly, social media had a heyday with the matter.

These got a few good chuckles out of me…

And at least one other person got a kick out of the AWS outage.

I liked how he graphically represented the debacle:

_Source: X_

That pretty much sums it up.

Just a few keystrokes caused by human error, and the whole
infrastructure fell.

Elon Musk took the opportunity to show the world that his services at X,
Grok, and Starlink were not impacted at all. That was intentional…

Time to Rethink Resiliency

Once Musk acquired Twitter, one of the first things that he and his team
did was to rearchitect Twitter’s, not X’s, information technology
infrastructure to improve both performance and resiliency.

This meant building on-premise data centers.

Massive data centers – owned and controlled by X and designed to never
have a single point of failure. Nearly everyone laughed at Musk and said
it couldn’t be done, that he would fail, especially considering that
he removed about 80% of the Twitter workforce.

X also used a multi-cloud strategy to add even more resiliency to its
services.

And Starlink is being designed in such a way that it is becoming the
most resilient global communications network, enabling data and voice
services completely outside of mobile network operators and major cloud
service providers.

No one is laughing at him now.

Even better, X was the best place to find information about what was
going on with the AWS outage when it was happening.

Very few companies are thinking about this kind of resiliency. All of
them should be.

To be fair to AWS, this outage wasn’t “the entire internet”… and
it was nowhere near the scale of the Crowdstrike global internet failure
that I wrote about in _The Bleeding Edge – Grounded by the World’s
Largest IT Outage_ [5] in July of last year.

But it was still a big deal, and a stark wakeup call to the need to
rethink network architectures, centralized services, and how to build in
decentralized resiliency.

_Jeff_

Keep reading

The Quantum Internet [6]

Let’s step back to see the big picture on the ramifications of the
rapid adoption of quantum technology.

          [6]

The Next Quantum Spin-Out? [7]

It might come as a surprise to learn that some well-known quantum
computing companies have academic institution origins…

          [7]

Quantum Computers vs. GPU Data Centers [8]

We hope you’ve enjoyed Quantum Week here at The Bleeding Edge…

          [8]

Like what you’re reading? Send your thoughts to
feedback@brownstoneresearch.com.

          [9]

Brownstone Research
1125 N Charles St, Baltimore, MD 21201
www.brownstoneresearch.com [9]

To ensure our emails continue reaching your inbox, please add our email
address [10] to your address book.

This editorial email containing advertisements was sent to
kenneth@hamer-hodges.us because you subscribed to this service. To stop
receiving these emails, click here [11].

Brownstone Research welcomes your feedback and questions. But please
note: The law prohibits us from giving personalized advice.

To contact Customer Service, call toll free Domestic/International:
1-888-493-3156, Mon-Fri, 9am-5pm ET, or email us here.

© 2025 Brownstone Research. All rights reserved. Any reproduction,
copying, or redistribution of our content, in whole or in part, is
prohibited without written permission from Brownstone Research.

Privacy Policy [12] | Terms of Use [13]

 

Links:
------
[1] https://click.e.brownstoneresearch.com/?qs=6f8a8d675c2f44f7583e32dbb1163ac6b886a9448b527bd099abd7c816e658df83939860febde41e4379d4d25d7e4c5ec94cb417c68e2467dfb16c1667da3292
[2] https://click.e.brownstoneresearch.com/?qs=6f8a8d675c2f44f76bc4986c3f67d39c5908a428ea6e51b95f2264b4ee26f9ec516e3448bfd7acea628f0b329c3d79b7b53a419ae85fd3cd917104a6710384f5
[3] https://click.e.brownstoneresearch.com/?qs=6f8a8d675c2f44f7b8eb0ef2825ebec52580a69d4a5d0cc4daf9963c214a4ea01cb4625cd77a58dbbd77bc3187ac010a0eaa77a379b33b51fd3782f32a0de53e
[4] https://click.e.brownstoneresearch.com/?qs=6f8a8d675c2f44f7b25407c6bbbaabf788b6e8498c86859f187d0fb3feb1c3424730a39fac457578dc0c1b443cfd1c2a30eb9f74bf94f2c443161d2975493d79
[5] https://click.e.brownstoneresearch.com/?qs=6f8a8d675c2f44f744271d5ee1482b12c92d63aceb287e469b82051a46b9e8ba7fddaad4dc7e8562c6e7070fa6b815440c3dcf0fcac7bf22be6e22960dbf466a
[6] https://click.e.brownstoneresearch.com/?qs=6f8a8d675c2f44f734d0e80bc15a7583138a50b2c84eb6e9e8dfd5d00203289c08cbbafd9a6bdaa16cdf1571d8a5e9edd3ba413a584b8ebb2b0a346ffd63ab61
[7] https://click.e.brownstoneresearch.com/?qs=6f8a8d675c2f44f788c9c99af2feaa0a967206e0320b33f96836c226e3eb25b70b6e11b01dd257321138bc8df3bdb9b877c7ca15bf8c019498d0d5698b2ec417
[8] https://click.e.brownstoneresearch.com/?qs=6f8a8d675c2f44f73cb6adfd7b91cc6260cbdbb285b1976cbd8dac5a8fcddb8c2bcb6a21418f3d911271966d028b30401df5c1ae10b5692b349391e75f24a57e
[9] https://click.e.brownstoneresearch.com/?qs=6f8a8d675c2f44f7756a7261cc970d21040f421201172bfb11cc4cb97cc43a63a0d863396bb42cbefd0954bb6f8622cb92d08aa282ca75761670501b3c9f6a64
[10] https://click.e.brownstoneresearch.com/?qs=6f8a8d675c2f44f75e71cb11274bf134d7de4cbc6f6aa06d7e727db9fe5115b5e5988b8ee468f7681410e0e12fd08037e1579e5778e1b110ce93d5748bf6b9ec
[11] https://click.e.brownstoneresearch.com/?qs=6f8a8d675c2f44f7d9bdd91a2897cac7d1f4baa8116f8eb9b2ad75ae3e0bf970cd1231f4b1478211adb4f065003d0bbcb64db12646a2b8808c8030e70303a8c1
[12] https://click.e.brownstoneresearch.com/?qs=6f8a8d675c2f44f75862c77fc9302f8806f232d6dd0fedc4529140ba45d9bb684e7634fe39b74ff226c748122eb2f46a20e8cc10401a3028e93e020dcd2e15d6
[13] https://click.e.brownstoneresearch.com/?qs=6f8a8d675c2f44f7ef055394762c6872e63f6ee83deeeb6435a6ae5c67e90828f9960202ee1cc53db04dd10fed34eb98679778d8ae7da9ce8d97ddba0b08a065

Comments