mdadm RAID with Proxmox

Post thumbnail

I recently acquired a new server with 2 drives that I intended to use as RAID1 for a virtualisation host for various things.

My hypervisor of choice is Proxmox (For a few reasons, Support for KVM and LXC primarily, but the fact it’s debian based is a nice bonus, and I really dislike the occasionally-braindead networking implementation from vmware which rules out ESXi)

This particular server does not have a RAID card, so I needed to use a software raid implementation. Out of the box for RAID1 on Proxmox you need to use ZFS, however To keep this box similar to others I have I wanted to use ext4 and mdadm. So we’re going have to do a bit of manual poking to get this how we need it.

This post is mostly an aide-memoire for myself for the future.

Install Proxmox

So, first thing to do - is get a fresh proxmox install, I’m using 5.2-1 at the time of writing.

After the install is done, we should have 1 drive with a proxmox install, and 1 unused disk.

The installer will create a proxmox default layout that looks something like this (I’m using 1TB Drives):

Device      Start        End    Sectors   Size Type
/dev/sda1    2048       4095       2048     1M BIOS boot
/dev/sda2    4096     528383     524288   256M EFI System
/dev/sda3  528384 1953525134 1952996751 931.3G Linux LVM

This looks good, so now we can begin moving this to a RAID array.

Clone partition table from first drive to second drive.

In my examples, sda is the drive that we installed proxmox to, and sdb is the drive I want to use as a mirror.

To start with, let’s clone the partition table for sda to sdb, which is really easy on linux using sfdisk:

root@tirant:~# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ... OK

Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xa0492137

Old situation:

>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Created a new GPT disklabel (GUID: 7755C404-FEA5-004A-998C-F85E217AE7B7).
/dev/sdb1: Created a new partition 1 of type 'BIOS boot' and of size 1 MiB.
/dev/sdb2: Created a new partition 2 of type 'EFI System' and of size 256 MiB.
/dev/sdb3: Created a new partition 3 of type 'Linux LVM' and of size 931.3 GiB.
/dev/sdb4: Done.

New situation:

Device      Start        End    Sectors   Size Type
/dev/sdb1    2048       4095       2048     1M BIOS boot
/dev/sdb2    4096     528383     524288   256M EFI System
/dev/sdb3  528384 1953525134 1952996751 931.3G Linux LVM

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
root@tirant:~#

sdb now has the same partition table as sda. However we’re converting this to a raid1, so we’ll want to change the partition type, which we can also do easily with sfdisk:

root@tirant:~# sfdisk --part-type /dev/sdb 3 A19D880F-05FC-4D3B-A006-743F0F84911E

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
root@tirant:~#

(for MBR, you would use something like: sfdisk --part-type /dev/sdb 3 fd)

Set up mdadm

So now we need to setup a RAID1. mdadm isn’t installed by default so we’ll need to install it using: apt-get install mdadm (You may need to run apt-get update first).

Once mdadm is installed, lets create the raid1 (we’ll create an array with a “missing” disk to start with, we’ll add the first disk into the array in due course):

root@tirant:~# mdadm --create /dev/md0 --level=1 --raid-disks=2 missing /dev/sdb3
mdadm: Note: this array has metadata at the start and
    may not be suitable as a boot device.  If you plan to
    store '/boot' on this device please ensure that
    your boot-loader understands md/v1.x metadata, or use
    --metadata=0.90
Continue creating array?
Continue creating array? (y/n) y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
root@tirant:~#

And now check that we have a working one-disk array:

root@tirant:~# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb3[1]
      976367296 blocks super 1.2 [2/1] [_U]
      bitmap: 8/8 pages [32KB], 65536KB chunk

unused devices: <none>
root@tirant:~#

Fantastic.

Move proxmox to the new array

Because proxmox uses lvm, this next step is quite straight forward.

Firstly, lets turn this new raid array into an lvm pv:

root@tirant:~# pvcreate /dev/md0
  Physical volume "/dev/md0" successfully created.
root@tirant:~#

And add it into the pve vg:

root@tirant:~# vgextend pve /dev/md0
  Volume group "pve" successfully extended
root@tirant:~#

Now we can move the proxmox install over to the new array using pvmove:

root@tirant:~# pvmove /dev/sda3 /dev/md0
  /dev/sda3: Moved: 0.00%
  /dev/sda3: Moved: 0.19%
  ...
  /dev/sda3: Moved: 99.85%
  /dev/sda3: Moved: 99.95%
  /dev/sda3: Moved: 100.00%
root@tirant:~#

(This will take some time depending on the size of your disks)

Once this is done, we can remove the non-raid disk from the vg:

root@tirant:~# vgreduce pve /dev/sda3
  Removed "/dev/sda3" from volume group "pve"
root@tirant:~#

And remove LVM from it:

root@tirant:~# pvremove /dev/sda3
  Labels on physical volume "/dev/sda3" successfully wiped.
root@tirant:~#

Now, we can add the new disk into the array.

We again change the partition type:

root@tirant:~# sfdisk --part-type /dev/sda 3 A19D880F-05FC-4D3B-A006-743F0F84911E

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
root@tirant:~#

and then add it into the array:

root@tirant:~# mdadm --add /dev/md0 /dev/sda3
mdadm: added /dev/sda3
root@tirant:~#

We can watch as the array is synced:

root@tirant:~# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda3[2] sdb3[1]
      976367296 blocks super 1.2 [2/1] [_U]
      [>....................]  recovery =  0.1% (1056640/976367296) finish=123.0min speed=132080K/sec
      bitmap: 8/8 pages [32KB], 65536KB chunk

unused devices: <none>
root@tirant:~#

We need to wait for this to complete before continuing.

root@tirant:~# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda3[2] sdb3[1]
      976367296 blocks super 1.2 [2/2] [UU]
      bitmap: 1/8 pages [4KB], 65536KB chunk

unused devices: <none>
root@tirant:~#

Making the system bootable

Now we need to ensure we can boot this new system!

Add the required mdadm config to mdadm.conf

root@tirant:~# mdadm --examine --scan >> /etc/mdadm/mdadm.conf
root@tirant:~#

Add some required modules to grub:

echo '' >> /etc/default/grub
echo '# RAID' >> /etc/default/grub
echo 'GRUB_PRELOAD_MODULES="part_gpt mdraid09 mdraid1x lvm"' >> /etc/default/grub

and update grub and the kernel initramfs

root@tirant:~# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.17-1-pve
Found initrd image: /boot/initrd.img-4.15.17-1-pve
Found memtest86+ image: /boot/memtest86+.bin
Found memtest86+ multiboot image: /boot/memtest86+_multiboot.bin
done
root@tirant:~# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-4.15.17-1-pve
root@tirant:~#

And actually install grub to the disk.

root@tirant:~# grub-install /dev/sda
Installing for i386-pc platform.
Installation finished. No error reported.
root@tirant:~#

If the server is booting via EFI, the output will be slightly different. We can also force it to install for the alternative platform using --target i386-pc or --target x86_64-efi, eg:

root@tirant:~# grub-install --target x86_64-efi --efi-directory /mnt/efi
Installing for x86_64-efi platform.
File descriptor 4 (/dev/sda2) leaked on vgs invocation. Parent PID 29184: grub-install
File descriptor 4 (/dev/sda2) leaked on vgs invocation. Parent PID 29184: grub-install
EFI variables are not supported on this system.
EFI variables are not supported on this system.
grub-install: error: efibootmgr failed to register the boot entry: No such file or directory.
root@tirant:~#

(/mnt/efi is /dev/sda2 mounted)

Now, clone the BIOS and EFI partitions from the old disk to the new one:

root@tirant:~# dd if=/dev/sda1 of=/dev/sdb1
2048+0 records in
2048+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0263653 s, 39.8 MB/s
root@tirant:~# dd if=/dev/sda2 of=/dev/sdb2
524288+0 records in
524288+0 records out
268435456 bytes (268 MB, 256 MiB) copied, 5.48104 s, 49.0 MB/s
root@tirant:~#

Finally, reboot and test, if everything has worked, the server should boot up as normal.