最近又接到了AWS的通知信, 說因為他們的問題, 造成我的一個instance 會被terminated. 信大概長這樣:
We have noticed that one or more of your instances are running on a host degraded due to hardware failure.
i-a20a871a
The host needs to undergo maintenance and will be taken down at 12:00 GMT on 2010-06-16. Your instances will be terminated at this point.
......
這算是比較正常的failure, 當然EC2 instances也有直接掛掉的情形. 我覺得以前比較常發生直接掛掉, 最近一年來倒是沒遇到. 而且直接掛掉的都是m1.small
的instances. 比較大的instances都沒有直接掛掉過. 在上次的課程裡, 有人問說為什麼要改用EC2, EC2還是會掛啊?! 沒錯, 天有不測風雲, EC2 instances也是會掛的. 事實是, 任何東西都會掛. 就算是最貴最高檔的server也一樣. 你要有一個觀念, 用cloud computing 時, Backup 和 Diseaster recovery 一樣也不能少. 差別在於, EC2 instances 掛了, 我坐在我的辦公室裡, 下幾個指令就回來了. 如果是租主機, 只能通報, 然後在那邊著急. 如果是主機代管, 很抱歉, 自己去一趟機房咩!
Steps of Recovery
因為我是使用EBS-boot instances, 所以recover 的動作就變得很簡單. 以下就一步一步來看:
先找出那個instance的資料, 後面會需要用到:
$ ec2-describe-instances i-a20a871a RESERVATION r-441f2d2c 107357334611 default INSTANCE i-a20a871a ami-155b3811 ec2-67-202-19-78.compute-1.amazonaws.com domU-12-11-21-01-D1-F2.compute-1.internal running mykeypair 0 m 1.small 2010-03-01T07:26:57+0000 us-east-1a monitoring-disabled 67.202.19.78 10.253.214.4 ebs BLOCKDEVICE /dev/sda1 vol-242e0c1a 2010-03-01T08:04:09.000Z
要把volume ID 和AMI ID記下來, 我的volume ID是vol-242e0c1a
, AMI ID是ami-155b3811
也可以查一下這個舊的volume, 一定是attach到舊的那個instance.
$ ec2-describe-volumes vol-242e0c1a ATTACHMENT vol-242e0c1a i-a20a871a /dev/sda1 attached 2010-03-01T07:27:09+0000
連進去舊的instance, 小心一點的話, 最好先確定一下instance ID 是不是一樣:
$ curl http://169.254.169.254/latest/meta-data/instance-id && echo i-a20a871a
把該關的services關一關, 還有如果有其它資料是存在instance storage(ephemeral storage)上要備份的話, 也先備份出來, 要不然terminate instance之後那些資料就沒了. 好了之後, 可以先stop instance, 只有EBS-boot instances可以被stop, stop時是不算EC2 的instance/hour 費用的. 好處就是可以再start起來, 資料還是之前存在EBS volume上的. 缺點就是IP 會改.
$ ec2-stop-instances i-a20a871a INSTANCE i-a20a871a running stopping
等它的狀態是stopped
$ ec2-describe-instances i-a20a871a RESERVATION r-441f2d2c 107357334611 default INSTANCE i-a20a871a ami-155b3811 stopped mykeypair 0 m1.small 2010-03-01T07:26:57+0000 us-east-1a monitoring-disabled ebs BLOCKDEVICE /dev/sda1 vol-242e0c1a 2010-03-01T08:04:09.000Z
然後對那個舊的volume作snapshot, -d "description"
記得要寫得易懂一些.
$ ec2-create-snapshot vol-242e0c1a -d "Ubuntu 8.04 20100607" SNAPSHOT snap-1a73348a vol-242e0c1a pending 2010-06-07T07:55:59+0000 107357334611 20 Ubuntu 8.04 20100607
等snapshot的狀態是completed
$ ec2-describe-snapshots snap-1a73348a SNAPSHOT snap-1a73348a vol-242e0c1a completed 2010-06-07T07:55:59+0000 100% 107357334611 20 Ubuntu 8.04 20100607
現在把這個新的snapshot註冊為AMI, 同樣的, -d "description"
也是要寫得descriptive一點, 自己以後才看得懂是什麼image咩!
$ ec2-register -s snap-1a73348a -a i386 -n "ebs-ubuntu-8.04-i386-20100607" -d "EBS Ubuntu 8.04 i386 20100607" -b /dev/sda2=ephemeral0 --root-device-name /dev/sda1 IMAGE ami-c4b65e31
用新的AMI來開instance吧!
$ ec2-run-instances ami-c4b65e31 -k mykeypair -g default -z us-east-1a -t m1.small --instance-initiated-shutdown-behavior stop --disable-api-termination RESERVATION r-4ac31021 107357334611 default INSTANCE i-949c91ef ami-c4b65e31 pending mykeypair 0 m1.small 2010-06-07T08:01:21+0000 us-east-1a monitoring-disabled
等新的instance的狀態是running, 就可以連進去看看
$ ec2-describe-instances i-949c91ef RESERVATION r-4ac31021 107357334611 default INSTANCE i-949c91ef ami-c4b65e31 ec2-174-129-101-39.compute-1.amazonaws.com domU-12-31-38-00-40-22.compute-1.internal running mykeypair 0 m1.small 2010-06-07T08:01:21+0000 us-east-1a monitoring-disabled 174.129.101.39 10.252.71.208 ebs BLOCKDEVICE /dev/sda1 vol-a1179d66 2010-06-07T08:01:25.000Z $ ssh -i mykeypair.pem root@174.129.101.39
Clean Up
確認新的instance工作一切正常之後, 就可以把之前舊的instance, volume, snapshot清乾淨了. 首先, 把舊的instance給砍了.
$ ec2-terminate-instances i-a20a871a Client.OperationNotPermitted: The instance 'i-a20a871a' may not be terminated. Modify its 'disableApiTermination' instance attribute and try again.
哈哈! 如果你和我一樣, 都習慣在run instance時加了--disable-api-termination
, 或是用
$ ec2-modify-instance-attribute --disable-api-termination true $instanceId
把API termination 給disabled的話, 就可以避免不小心把instance給砍了, 一失足成千古恨咩! 現在確定要砍了的話, 就把API termination enable:
$ ec2-modify-instance-attribute i-a20a871a --disable-api-termination false disableApiTermination i-a20a871a false
現在可以terminate了.
$ ec2-terminate-instances i-a20a871a INSTANCE i-a20a871a stopped terminated
然後看一下舊的instance的AMI的資料, 記下snapshot ID.
$ ec2-describe-images ami-155b3811 IMAGE ami-155b3811 107357334611/ebs-ubuntu-8.04-32b-20100301 107357334611 available private i386 machine ebs BLOCKDEVICEMAPPING /dev/sda1 snap-e13ab246 20
先把舊的AMI deregister:
$ ec2-deregister ami-155b3811 IMAGE ami-155b3811
再把舊的snapshot給砍了
$ ec2-delete-snapshot snap-e13ab246 SNAPSHOT snap-e13ab246
最後可以把舊的volume 給砍了
$ ec2-delete-volume vol-242e0c1a VOLUME vol-242e0c1a
完成! 坐在自己的位子就可以做好了咩!