Whose turn is it to clean up the bits?
Par Samuel T. Cossette, lundi 2 octobre 2006 à 16:18 :: General :: #24 :: rss
I've recently put some time looking into Amazon's Elastic Cloud Computing (EC2) beta "resizable compute capacity" service (EC2 allows you to allocate new servers at the flick of a switch). The service itself does not change the way you do development, unlike S3 (another beta Amazon service that provides a unified interface to a massively scalable data storage back-end), but it does radically change the machine allocation process. EC2 lets you scale your computing (as in host) power in a few minutes. For a sysadmin, having a couple of GHz, Gigs of ram and 150Gb of storage nearly instantly accessible is amazing, when you're used to filling a one-year contract with a self managed dedicated server provider.
If you plan to give it a try (and you definitely should), don't forget to use an encrypted partition or wipe your instances' hard drives, since Amazon won't do it for you. In fact, when you terminate an instance, Amazon simply shuts the machine down. Then, if the same physical machine is allocated to somebody else, a hamster goes to that machine, powers it up, formats the hard drives and reinstalls a brand new operating system. Herein lies the problem – the hard drives are only repartitioned and formated, not initialized. This means that all the data is still physically on the hard drives, even though it is not readily accessible!
I have looked at a couple of hard drives and found some sensible data in the form of "private" source code, OpenVPN (complete with key and certificate) configurations, S3 access keys, EC2 keys and certificates, logins and passwords to domain name administration interfaces, etc. It was easy to find out who the owners were – they ranged from individuals to profitable startups (according to their pictures on flickr).
Who's to blame? Of course, Amazon could (and should) do something about the clean-up process, but organizations storing their intellectual property in plain text is also somewhat of a questionable practice.
Update (14:20 PDT): for those asking what was the technique used to recover the data, I simply run strings on /dev/sda2 (the ec2' ephemeral storage device). Ending up with unusable but understandable data. Tools exist to really recover filesystem, take a look at anyfs tools and e2retrieve.
Update (15:00 PDT): As someone suggested on the Amazon AWS EC2 developer forum, you can use an encypted loop device to make sure that your private data is not stored in plain text on hard drives.
Update (16:30 PDT): Cryptoloop, here is how. Someone suggest dm-crypt
Update (19:50 PDT): Anyone experiencing difficulties starting new instances? Mine are automatically terminated!
Update (20:25 PDT): All systems are back and operational. If you try to strings /dev/sda2, you get a whole 150gb of random data! The HR department was probably busy today, hiring all those new hamsters who will now have to type that random data at every instance's launch. Longue vie à EC2!
If you plan to give it a try (and you definitely should), don't forget to use an encrypted partition or wipe your instances' hard drives, since Amazon won't do it for you. In fact, when you terminate an instance, Amazon simply shuts the machine down. Then, if the same physical machine is allocated to somebody else, a hamster goes to that machine, powers it up, formats the hard drives and reinstalls a brand new operating system. Herein lies the problem – the hard drives are only repartitioned and formated, not initialized. This means that all the data is still physically on the hard drives, even though it is not readily accessible!
I have looked at a couple of hard drives and found some sensible data in the form of "private" source code, OpenVPN (complete with key and certificate) configurations, S3 access keys, EC2 keys and certificates, logins and passwords to domain name administration interfaces, etc. It was easy to find out who the owners were – they ranged from individuals to profitable startups (according to their pictures on flickr).
Who's to blame? Of course, Amazon could (and should) do something about the clean-up process, but organizations storing their intellectual property in plain text is also somewhat of a questionable practice.
Update (14:20 PDT): for those asking what was the technique used to recover the data, I simply run strings on /dev/sda2 (the ec2' ephemeral storage device). Ending up with unusable but understandable data. Tools exist to really recover filesystem, take a look at anyfs tools and e2retrieve.
Update (15:00 PDT): As someone suggested on the Amazon AWS EC2 developer forum, you can use an encypted loop device to make sure that your private data is not stored in plain text on hard drives.
Update (16:30 PDT): Cryptoloop, here is how. Someone suggest dm-crypt
Update (19:50 PDT): Anyone experiencing difficulties starting new instances? Mine are automatically terminated!
Update (20:25 PDT): All systems are back and operational. If you try to strings /dev/sda2, you get a whole 150gb of random data! The HR department was probably busy today, hiring all those new hamsters who will now have to type that random data at every instance's launch. Longue vie à EC2!
Commentaires
Aucun commentaire pour le moment.
Ajouter un commentaire
Les commentaires pour ce billet sont fermés.