Typo caused Amazon’s big cloud-computing outage – WHSV

NEW YORK (FOX, AP) UPDATE (Mar. 2):

Amazon says an incorrectly typed command during a routine debugging of its billing system caused the five-hour outage of some Amazon Web Services servers on Tuesday.

In a summary posted online, the Seattle company says a command meant to remove a small number of servers for one of its S3 subsystems was entered incorrectly and a larger set of servers was removed. A full restart was required, which took longer than expected due to how fast Amazon Web Services has grown over the past few years.

Amazon says it is making changes to its system to make sure incorrect commands won't trigger an outage of its web services in the future.

Amazon is the world's largest provider of cloud services, which entails hosting companies' computing functions on remote servers.

_____

ORIGINAL STORY (Feb. 28):

If you experienced a sluggish web browser or problems with some of your most-used websites and apps on Tuesday, then it was likely the result of an Amazon web service outage.

An outage hit Amazon Web Services Tuesday, reportedly impacting lots of web pages. Specifically, the cloud giant is experiencing problems with its Simple Storage Service (S3) on the East Coast. Widely used for backup and archive, S3 is harnessed by a host of companies.

We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services, wrote Amazon Web Services, on its service health dashboard. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue.

Users took to social media to discuss the outage.

BGR reports that when S3 goes down or experiences any type of latency or errors, it can prevent content from loading on web pages or cause requests to fail.

Sites like Imgur, Medium, Expedia, Mailchimp, Buffer and even the U.S. Securities and Exchange Commission were all impacted, as were communication services like Slack. Also ironically impacted, DownDetector.com, which is a website that tracks when other websites are down.

As of 1:49 PST, all service was restored.

Update at 2:08 PM PST: As of 1:49 PM PST, we are fully recovered for operations for adding new objects in S3, which was our last operation showing a high error rate. The Amazon S3 service is operating normally.

Update at 12:52 PM PST: We are seeing recovery for S3 object retrievals, listing and deletions. We continue to work on recovery for adding new objects to S3 and expect to start seeing improved error rates within the hour.

Update at 11:35 AM PST: We have now repaired the ability to update the service health dashboard. The service updates are below. We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue.

View original post here:
Typo caused Amazon's big cloud-computing outage - WHSV

Related Posts

Comments are closed.