Linux: Repairing a software raid on a Sunday morning.

The trouble started when emails were coming in claiming first “Fail” and later “DegradedArray” events of a software raid running on a Debian Linux server. Looking into logs revealed an SSD (NVMe) died but let us look at it step by step.

Checking things

Luckily for us the server was still operational and we were able to ssh into it. So the first thing we did was checking the raid status:

% cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 nvme1n1p3[1]
      465370432 blocks super 1.2 [2/1] [_U]
      bitmap: 4/4 pages [16KB], 65536KB chunk

md0 : inactive nvme1n1p1[1](S)
      33520640 blocks super 1.2

md1 : active (auto-read-only) raid1 nvme1n1p2[1]
      1046528 blocks super 1.2 [2/1] [_U]

We can see above that only one drive (nvme1n1) is used and the other one (nvme0n1) is missing. Also of our 3 groups one is inactive (md0). If we would have checked earlier we might have seen something like this:

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 nvme1n1p3[1]
      465370432 blocks super 1.2 [2/1] [_U]
      bitmap: 4/4 pages [16KB], 65536KB chunk

md0 : active (auto-read-only) raid1 nvme1n1p1[1]
      33520640 blocks super 1.2 [2/1] [_U]
              resync=PENDING

md1 : active raid1 nvme1n1p2[1]
      1046528 blocks super 1.2 [2/1] [_U]

Which means that the md0 group is still active but has been put into read-only mode and waits for re-sync.

The command nvme list (maybe you need to install the nvme-cli package) should list all nvme drives.

Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme1n1          /dev/ng1n1            XXXXXXXXXXXXXX       XXXXXXXXXXXXXXXXXXXXXXXXXX               1          XXXXX  GB / XXXXXX  GB    512   B +  0 B   XXXXXXXX

In our case it just listed one instead of two. But because disks usually do not dissolve into thin air we check the logs:

% sudo dmesg | grep nvme
nvme nvme0: I/O 1012 (I/O Cmd) QID 5 timeout, aborting
nvme nvme0: I/O 1012 QID 5 timeout, reset controller
nvme nvme0: I/O 8 QID 0 timeout, reset controller
nvme nvme0: Device not ready; aborting reset, CSTS=0x1
nvme nvme0: Abort status: 0x371
nvme nvme0: Device not ready; aborting reset, CSTS=0x1
nvme nvme0: Removing after probe failure status: -19
nvme nvme0: Device not ready; aborting reset, CSTS=0x1
nvme0n1: detected capacity change from 1000215216 to 0
md/raid1:md2: Disk failure on nvme0n1p3, disabling device.
md/raid1:md0: Disk failure on nvme0n1p1, disabling device.
md/raid1:md1: Disk failure on nvme0n1p2, disabling device.

Well, bingo! Looks like one of our disks died. This implies replacing it with a new one (or pinging our hosting provider to do so). After shutting down, replacing the hardware and booting up again, we can go to work to bring the new disk into our software raid.

Step 1: copy partition table

The first thing we need to do is copying the partition table from the intact drive to the new one. Our drives were partitioned with the MBR scheme by our hosting provider, therefore we needed to issue the following command:

% sudo sfdisk -d /dev/nvme1n1 | sudo sfdisk /dev/nvme0n1

This dumps the partition table of drive nvme1n1 and writes it onto drive nvme0n1. Which is just what we wanted. Please note that you have to use other commands if your drives are partitioned with GPT!

Also please be aware that in our case the “first” drive (nvme0) died and the “second” one stayed healthy. Depending on your setup up, you’ll likely have different drive names, so just don’t copy paste commands blindly from this article!

Step 2: repair the raid

In our case we have three groups to repair (md0, md1, md2) of which one (md0) is completely inactive.

Re-activate and repair an inactive group

First we need to activate our inactive group md0 via the following command which will enable us to proceed to the repair of it.

% sudo mdadm --manage /dev/md0 --run

After it has been activated we can simply add the correct partition of the new second drive to it:

% sudo mdadm /dev/md/0 -a /dev/nvme0n1p1

Doing a cat /proc/mdstat should now show something like this:

md0 : active raid1 nvme0n1p1[2] nvme1n1p1[1]
      33520640 blocks super 1.2 [2/1] [_U]
      [=============>.......]  recovery = 66.8% (22415360/33520640) finish=0.8min speed=206159K/sec

Repair an active group

For active raid groups (in our case md1 and md2) we don’t need to go through the extra step of activating them. Therefore simply adding the correct partitions from the new drive to the correct groups will trigger the recovery.

% sudo mdadm /dev/md/1 -a /dev/nvme0n1p2
% sudo mdadm /dev/md/2 -a /dev/nvme0n1p3

Remember to check via cat /proc/mdstat for the progress of the recovery. It should contain information like this:

md2 : active raid1 nvme0n1p3[2] nvme1n1p3[1]
      465370432 blocks super 1.2 [2/1] [_U]
      [============>........]  recovery = 61.8% (287942272/465370432) finish=14.2min speed=206976K/sec
      bitmap: 4/4 pages [16KB], 65536KB chunk

For larger partitions this can take a while. Please proceed only after the recovery has been finished completely!

Step 3: update the boot loader

Because we changed disks we need to update the bootloader. It is not installed on the new drive and if we need to change the old one we would be stuck without one. ;-)

First we need to update the device map for the GRUB bootloader:

% sudo grub-mkdevicemap -n

Afterwards we install it onto the new drive (Please note that we use nvme0n1 and not simply nvme0!):

% sudo grub-install /dev/nvme0n1

And that was it! Now cross your fingers and reboot the machine.

‹ jan0sch.de