21st
Amazon S3 - When in Doubt Reboot
I, like many others aren’t thrilled with Amazon Web Services at the moment. Amazon’s storage system (S3) was down most of the day Sunday, and brought down many sites alongside. Amazon has stated that they will investigate the cause of the outage and report back to the community. Over 24 hours later, there is no real update. What we have learned from Amazon is that their cloud communication protocol entitled gossip broke and that their system administrators were forced to restart and rebuild gossip to restore communications.
Amazon obviously needs to fix what’s broken, but more importantly introduce additional layers of redundancy. Simply restarting services and hoping for the best won’t solve the underlining problem. We are all lucky that S3 is still up and running especially since the fix was restarting the server and not making a single fix. I know cloud computing is the future and would be willing to pay the extra amount for that additional layer of redundancy.