I was doing some system maintenance today and came across the following horrific screen:
/dev/md0: Version : 00.90.03 Creation Time : Sun Nov 16 14:13:20 2007 Raid Level : raid5 Array Size : 732587712 (698.65 GiB 750.17 GB) Used Dev Size : 244195904 (232.88 GiB 250.06 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Wed Dec 31 10:41:15 2008 State : clean, degraded Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 ... Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 17 1 active sync /dev/sdb1 2 8 33 2 active sync /dev/sdc1 3 8 49 3 active sync /dev/sdd1 4 8 1 - faulty spare
One of the drives in my fileserver had died! Time to back up and get that sucker running again.
Please note: The following is only a guide to help you replace a failed disc. I cannot guarantee this will work for you, but it is what I do and has worked every time without any data loss.
As you can see, it is a 4 disc software RAID 5 array with no hot-swap spares. The following should work for most single disc failure situations in RAID 1, 5, or 6.
It appears that
sda has bailed on me. First things first, backup the machine. If anything happens, you can rebuild from scratch.
You can see the faulty disc has already been removed from the array, but if yours hasn't been removed yet, the commands:
mdadm --manage /dev/md0 -f /dev/sda1 mdadm --manage /dev/md0 -r /dev/sda1
will mark it as failed (so it can be removed) and remove the
Shutdown the machine and switch out the hard drives, make sure you only replace the faulty drive, don't mess up the order of the drives cause it'll be a pain to get it back in.
Boot up the machine. Your RAID array will be in the same degraded state. We need to partition the new drive exactly the same way we partitioned the drives in the existing array. Luckily this is a one-liner with
sfdisk -d /dev/sdb | sfdisk /dev/sda
The above code will dump the partition table of
sdb (or use any of the functioning drives) and pipe it to
sfdisk to partition
sda the same way. It should only take a second.
Then we can simply add the new drive to the array:
mdadm --manage /dev/md0 -a /dev/sda1
If you take a look at
cat /proc/mdstat or
mdadm --detail /dev/md0 you should see that the array is recovering (with a percentage done).
After the recovery is done, you'll be back to new and clean!