
If you have been up with the news lately you will know that Amazon was out for the count a couple of days back. It was not just any old outage they were out for a few hours. Ok international sites and web services were not affected but the main site was. Now this is huge for Amazon and huge for us. Not that it cost me anything but Amazon would have lost quite a bit. I did hear $31,000 a minute from someone. Almost what I get paid
So what exactly happened. Well here are the facts from Gigaom
1. Traffic to https://www.amazon.com was getting there. So DNS was configured properly to send traffic to Amazon’s data centers. Global server load balancing (GSLB) is the first line of defense when a data center goes off the air. Either GSLB didn’t detect that the main data center was down, or there was no spare to which it could send visitors.
2. When traffic hit the data center, the load balancer wasn’t redirecting it. This is the second line of defense, designed to catch visitors who weren’t sent elsewhere by GSLB.
3. If some of the servers died, the load balancer should have taken them out of rotation. Either it didn’t detect the error, or all the servers were out. This is the third line of defense.
4. Most companies have an “apology page” that the load balancer serves when all servers are down. This is the fourth line of defense, and it didn’t work either.
5. The HTTP 1.1 message users saw shows something that “speaks” HTTP was on the other end. So this probably wasn’t a router or firewall.
So what are your thoughts on the Amazon outage? Do you think heads should roll?
Monday, June 09, 2008
Amazon Was Out For The Count
blog comments powered by Disqus




