In the first two weeks of this month, Amazon Web Services (AWS) hit some bumps that caused two outages: a bigger, more widespread one on December 7, and a smaller, more localized one on Dec. 15. Both catalyzed disruptions across a range of websites and online applications, including Google, Slack, Disney Plus, Amazon, Venmo, Tinder, iRobot, Coinbase, and The Washington Post. These services all rely on AWS to provide cloud computing for themin fact, AWS is the leading cloud computing provider among other big players like Microsoft Azure, Google, IBM, and Alibaba.
To understand why the impact was so big, and what steps that companies can take to prevent something like these disruptions in the future, it makes sense to take a step back and take a look at what cloud computing is, and what its good for.
Whenever you connect to anything over the internet, your computer is essentially just talking to another computer. A server is a type of computer that can process requests and deliver data to other computers in the same network or over the internet.
But running your own server isnt cheap. You have to buy the hardware box, install it somewhere, and feed it a lot of power. In many cases, it needs internet connectivity too. Then, to ensure that data is received and sent with minimal delays, these servers need to be physically close to its users.
Additionally, you have to install software that needs to be updated regularly. And you have to build fail-safe mechanisms that will switch over operations to another server if a main server malfunctions.
[Related: Facebook has an explanation for its massive Monday outage]
The thing that companies like Amazon noticed is that a lot of [computing infrastructure] is not really specific to the service youre running, says Justine Sherry, an assistant professor at Carnegie Mellon University.
For example, the code running Netflix does something different compared to the code running a service like Venmo. The Netflix code is serving videos to users, and the Venmo code is facilitating financial transactions. But underneath, most of the computing work is actually the same.
This is where cloud providers come in. They usually have hundreds to thousands of servers all over the country with good bandwidth. They offer to take care of the tedious tasks like security, day-to-day management of the data center operations, and scaling services when needed.
Then you can focus on your [specialized] code. Just write the part that makes the video work, or the part that makes the financial transactions work. Its easier, its cheaper because Amazon is doing this for lots and lots of customers. Sherry explains. But there are also downsides, which is that everyone in the world is relying on the same couple of Costco-sized warehouses full of computers. There are dozens of them across the US. But when one of them goes down, its catastrophic.
What caused the AWS outages appeared to be related to errors with the automated systems handling the data flow behind the scenes.
AWS explained in a post that the December 7 error was due to a problem with an automated activity to scale capacity of one of the AWS services hosted in the main AWS network, which resulted in a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network, resulting in delays for communication between these networks.
[Related: A Look Inside the Data Centers of The Cloud]
This autoscaling capability allows the whole system to adjust the number of servers its using based on the amount of users on the network. The idea there is if I have 100 users at 7 am, and then at noon, everyone is on lunch break Amazon shopping and now I have 1,000 users, I need 10 times as many computers to interact with all those clients, explains Sherry. These frameworks automatically look at how much demand there is and can dedicate more servers to doing whats needed when its needed.
Later on December 15, a status update issued by AWS said that the outage was caused by traffic engineering incorrectly moving more traffic than expected to parts of the AWS Backbone that affected connectivity to a subset of Internet destinations.
Big data centers have lots of internet connections through different internet service providers. They get to choose where online traffic gets routed, whether its over one cable through AT&T, or another cable through Sprint.
Their automatic traffic engineering decides to reroute traffic based on a number of conditions. Most providers are going to reroute traffic mostly based on load. They want to make sure things are relatively balanced, Sherry says. It sounds like that auto-adaptation failed on the 15th, and they wound up routing too much traffic over one connection. You can literally think of it like a pipe that has had too much water and the water is coming out the seams. That data ends up getting dropped and disappears.
Despite some prevalent outages over the past few years, Sherry argues that AWS is quite good at managing their infrastructure. Inherently, its very difficult to design perfect algorithms that can anticipate every problem, and bugs are an annoying but regular part of software development. The only thing thats unique about the cloud situation is the impact.
[Related: Amazons venture into the bizarre world of quantum computing has a new home base]
A growing number of independent companies are turning to third-party centralized services like AWS for cloud infrastructure, storage, and more.
If I pay Amazon to run a data center for me, store my files, and serve my clients theyre going to do a better job than I can do as an university administrator or as an administrator to a small company, says Sherry. But from a societal perspective, when all of these small individual actors decide to outsource to the cloud, we wind up with one really big centralized dependency.
During the time AWS went out, Sherry could not control her television. Normally, she uses her phone as a remote control. But the phone does not directly talk to the TV. Instead, both the phone and the TV talk to a server in the cloud, and that server is orchestrating that in-between. The cloud is essential for some functions, like downloading automatic software updates. But for scrolling through cable offerings available from an antenna or satellite, theres no reason that needs to happen, she says. Were in the same room, were on the same wireless network, all Im trying to do is change the channel. In short, the cloud can offer convenient tech solutions in some instances, but not all.
[Related: This Is Why Microsoft Is Putting Data Servers In The Ocean]
One account of a marooned technology that struck her most as an unnecessarily roundabout design was a timed cat feeder that had to go through the cloud. Automated cat feeders have been around a long time before the cloud. Theyre basically paired to an alarm clock. But for some reason, someone decided that rather than building the alarm clock part into the cat feeder, they were going to put the alarm clock feeder in the cloud, and have the cat feeder go over the internet and ask the cloud, is it time to feed the cat? Sherry says. Theres no reason that that needed to be put into the cloud.
Moving forward, she thinks that application developers should review every feature thats intended for the cloud and ask if it can work without the cloud, or at least have an offline mode thats not as completely debilitating during an internet, data center, or even power outage.
There are other things that are probably not going to work. Youre probably not going to be able to log in to your online banking if you cant get to the bank server, says Sherry. But so many of the things that failed are things that really should not have failed.
Read the original:
AWS outages and cloud computing, explained - Popular Science
- Setting up a Virtual Server on Ninefold - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
- ScaleXtreme Automates Cloud-Based Patch Management For Virtual, Physical Servers [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Secure Cloud Computing Software manages IT resources. [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Dell unveils new servers, says not a PC company [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Wyse to Launch Client Infrastructure Management Software as a Service, Enabling Simple and Secure Management of Any ... [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- As the App Culture Builds, Dell Accelerates its Shift to Services with New Line of Servers, Flash Capabilities [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Terraria - Cloud In A Ballon - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Ethernet Alliance Interoperability Demo Showcases High-Speed Cloud Connections [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- RSA and Zscaler Teaming Up to Deliver Trusted Access for Cloud Computing [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- [NEC Report from MWC2012] NEC-Cloud-Marketplace - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- IBM SmartCloud Virtualized Server Recovery - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- BeyondTrust Launches PowerBroker Servers Windows Edition [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
- Ericsson joins OpenStack cloud infrastructure community [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
- ScaleXtreme Cloud-Based Patch Management Open for New Customers [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- RootAxcess - Getting Started - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- How to Create a Terraria Server 1.1.2 (All Links Provided) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- Dell #1 in Hyperscale Servers (Steve Cumings) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- Managing SAP on Power Systems with Cloud technologies delivers superior IT economics - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- AMD Acquires Cloud Server Maker SeaMicro for $334M USD [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- Web Host 1&1 Provides More Flexibility with Dynamic Cloud Server [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- Leap Day brings down Microsoft's Azure cloud service [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- RightMobileApps White Label Program - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- bzst server ban #2 - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- “Cloud storage served from an array would cost $2 a gigabyte” [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- More Flexibility with the 1&1 Dynamic Cloud Server [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Hub’s future jobs may be in cloud [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Cloud computing growing jobs, says Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- TurnKey Internet Launches WebMatrix, a New Application in Partnership with Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Cebit 2012: SAP Cloud Computing Strategy - Introduction - Video [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Dome9 Security Launches Industry's First Free Cloud Security for Unlimited Number of Servers [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Servers Are Refreshed With Intel's New E5 Chips [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Samsung's AllShare Play pushes pictures from phone to cloud and TV [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Google drops the price of Cloud Storage service [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- New Intel Server Technology: Powering the Cloud to Handle 15 Billion Connected Devices [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Swisscom IT Services Launches Cloud Storage Services Powered by CTERA Networks [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- KineticD Releases Suite of Cloud Backup Offerings for SMBs [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- First Look: Samsung Allshare Play - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Bill The Server Guy Introduces the New Intel XEON e5-2600 (Romley) Server CPU's - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- New Cisco servers have Intel Xeon E5 inside [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- Cisco rolls out UCS servers with Intel Xeon E5 chips [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- From scooters to servers: The best of Launch, Day One [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- Computer Basics: What is the Cloud? - Video [Last Updated On: March 9th, 2012] [Originally Added On: March 9th, 2012]
- Could the digital 'cloud' crash? [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
- Dome9 Security Launches Free Cloud Security For Unlimited Number Of Servers [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
- Cloud computing 'made in Germany' stirs debate at CeBIT [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
- New Key Technology Simplifies Data Encryption in the Cloud [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
- Can a private cloud drive energy efficiency in datacentres? [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Porticor's new key technology simplifies data encryption in the cloud [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Borders + Gratehouse Adds Three New Clients in Cloud Sector [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Dell to invest $700 mn in R&D, unveils 12G servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Defiant Kaleidescape To Keep Shipping Movie Servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Data Centre Transformation Master Class 3: Cloud Architecture - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 1/3 - Video #310 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Cloud Computing - 28/02/12 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- SYS-CON.tv @ 9th Cloud Expo | Nand Mulchandani, CEO and Co-Founder of ScaleXtreme - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Oni Launches New Cloud Services for Enterprises Using CA Technologies Cloud Platform [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- SmartStyle Advanced Technology - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- SmartStyle Infrastructure - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- The Hidden Risk of a Meltdown in the Cloud [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- FireHost Launches Secure Cloud Data Center in Phoenix, Arizona [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- Panda Security Launches New Channel Partner Recruitment Campaign: "Security to the Power of the Cloud" [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- NetSTAR, Inc. Announces Safe and Secure Web Browsers for iPhones, iPads, and Android Devices [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- Amazon Cloud Powered by 'Almost 500,000 Servers' [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- NetSTAR Announces Secure Web Browsers For iPhones, iPads, And Android Devices [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Be Prepared For When the Cloud Really Fails [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Dr. Cloud explains dinCloud's hosted virtual server solution - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- New estimate pegs Amazon's cloud at nearly half a million servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Amazon’s Web Services Uses 450K Servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Saving File On Internet - Cloud Computing - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 2/3 - Video #311 - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Linux servers keep growing, Windows & Unix keep shrinking [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Cloud Desktop from Compute Blocks - Video [Last Updated On: March 16th, 2012] [Originally Added On: March 16th, 2012]
- Amazon EC2 cloud is made up of almost half-a-million Linux servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- HP trots out new line of “self-sufficient” servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Cloud Web Hosting Reviews - Australian Cloud Hosting Providers - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Using Porticor to protect data in a snapshot scenario in AWS - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- CDW - Charles Barkley - New Office - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Nearly a Half Million Servers May Power Amazon Cloud [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Morphlabs CEO Winston Damarillo talks about their mCloud Rack - Video [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]
- AMD reaches for the cloud with new server chips [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]