mdadm RAID with Proxmox

Post thumbnail

I recently acquired a new server with 2 drives that I intended to use as RAID1 for a virtualisation host for various things.

My hypervisor of choice is Proxmox (For a few reasons, Support for KVM and LXC primarily, but the fact it’s debian based is a nice bonus, and I really dislike the occasionally-braindead networking implementation from vmware which rules out ESXi)

This particular server does not have a RAID card, so I needed to use a software raid implementation. Out of the box for RAID1 on Proxmox you need to use ZFS, however To keep this box similar to others I have I wanted to use ext4 and mdadm. So we’re going have to do a bit of manual poking to get this how we need it.

This post is mostly an aide-memoire for myself for the future.

Install Proxmox

So, first thing to do - is get a fresh proxmox install, I’m using 5.2-1 at the time of writing.

After the install is done, we should have 1 drive with a proxmox install, and 1 unused disk.

The installer will create a proxmox default layout that looks something like this (I’m using 1TB Drives):

Device      Start        End    Sectors   Size Type
/dev/sda1    2048       4095       2048     1M BIOS boot
/dev/sda2    4096     528383     524288   256M EFI System
/dev/sda3  528384 1953525134 1952996751 931.3G Linux LVM

This looks good, so now we can begin moving this to a RAID array.

Clone partition table from first drive to second drive.

In my examples, sda is the drive that we installed proxmox to, and sdb is the drive I want to use as a mirror.

To start with, let’s clone the partition table for sda to sdb, which is really easy on linux using sfdisk:

root@tirant:~# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ... OK

Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xa0492137

Old situation:

>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Created a new GPT disklabel (GUID: 7755C404-FEA5-004A-998C-F85E217AE7B7).
/dev/sdb1: Created a new partition 1 of type 'BIOS boot' and of size 1 MiB.
/dev/sdb2: Created a new partition 2 of type 'EFI System' and of size 256 MiB.
/dev/sdb3: Created a new partition 3 of type 'Linux LVM' and of size 931.3 GiB.
/dev/sdb4: Done.

New situation:

Device      Start        End    Sectors   Size Type
/dev/sdb1    2048       4095       2048     1M BIOS boot
/dev/sdb2    4096     528383     524288   256M EFI System
/dev/sdb3  528384 1953525134 1952996751 931.3G Linux LVM

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
root@tirant:~#

sdb now has the same partition table as sda. However we’re converting this to a raid1, so we’ll want to change the partition type, which we can also do easily with sfdisk:

root@tirant:~# sfdisk --part-type /dev/sdb 3 A19D880F-05FC-4D3B-A006-743F0F84911E

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
root@tirant:~#

(for MBR, you would use something like: sfdisk --part-type /dev/sdb 3 fd)

Set up mdadm

So now we need to setup a RAID1. mdadm isn’t installed by default so we’ll need to install it using: apt-get install mdadm (You may need to run apt-get update first).

Once mdadm is installed, lets create the raid1 (we’ll create an array with a “missing” disk to start with, we’ll add the first disk into the array in due course):

root@tirant:~# mdadm --create /dev/md0 --level=1 --raid-disks=2 missing /dev/sdb3
mdadm: Note: this array has metadata at the start and
    may not be suitable as a boot device.  If you plan to
    store '/boot' on this device please ensure that
    your boot-loader understands md/v1.x metadata, or use
    --metadata=0.90
Continue creating array?
Continue creating array? (y/n) y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
root@tirant:~#

And now check that we have a working one-disk array:

root@tirant:~# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb3[1]
      976367296 blocks super 1.2 [2/1] [_U]
      bitmap: 8/8 pages [32KB], 65536KB chunk

unused devices: <none>
root@tirant:~#

Fantastic.

Move proxmox to the new array

Because proxmox uses lvm, this next step is quite straight forward.

Firstly, lets turn this new raid array into an lvm pv:

root@tirant:~# pvcreate /dev/md0
  Physical volume "/dev/md0" successfully created.
root@tirant:~#

And add it into the pve vg:

root@tirant:~# vgextend pve /dev/md0
  Volume group "pve" successfully extended
root@tirant:~#

Now we can move the proxmox install over to the new array using pvmove:

root@tirant:~# pvmove /dev/sda3 /dev/md0
  /dev/sda3: Moved: 0.00%
  /dev/sda3: Moved: 0.19%
  ...
  /dev/sda3: Moved: 99.85%
  /dev/sda3: Moved: 99.95%
  /dev/sda3: Moved: 100.00%
root@tirant:~#

(This will take some time depending on the size of your disks)

Once this is done, we can remove the non-raid disk from the vg:

root@tirant:~# vgreduce pve /dev/sda3
  Removed "/dev/sda3" from volume group "pve"
root@tirant:~#

And remove LVM from it:

root@tirant:~# pvremove /dev/sda3
  Labels on physical volume "/dev/sda3" successfully wiped.
root@tirant:~#

Now, we can add the new disk into the array.

We again change the partition type:

root@tirant:~# sfdisk --part-type /dev/sda 3 A19D880F-05FC-4D3B-A006-743F0F84911E

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
root@tirant:~#

and then add it into the array:

root@tirant:~# mdadm --add /dev/md0 /dev/sda3
mdadm: added /dev/sda3
root@tirant:~#

We can watch as the array is synced:

root@tirant:~# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda3[2] sdb3[1]
      976367296 blocks super 1.2 [2/1] [_U]
      [>....................]  recovery =  0.1% (1056640/976367296) finish=123.0min speed=132080K/sec
      bitmap: 8/8 pages [32KB], 65536KB chunk

unused devices: <none>
root@tirant:~#

We need to wait for this to complete before continuing.

root@tirant:~# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda3[2] sdb3[1]
      976367296 blocks super 1.2 [2/2] [UU]
      bitmap: 1/8 pages [4KB], 65536KB chunk

unused devices: <none>
root@tirant:~#

Making the system bootable

Now we need to ensure we can boot this new system!

Add the required mdadm config to mdadm.conf

root@tirant:~# mdadm --examine --scan >> /etc/mdadm/mdadm.conf
root@tirant:~#

Add some required modules to grub:

echo '' >> /etc/default/grub
echo '# RAID' >> /etc/default/grub
echo 'GRUB_PRELOAD_MODULES="part_gpt mdraid09 mdraid1x lvm"' >> /etc/default/grub

and update grub and the kernel initramfs

root@tirant:~# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.17-1-pve
Found initrd image: /boot/initrd.img-4.15.17-1-pve
Found memtest86+ image: /boot/memtest86+.bin
Found memtest86+ multiboot image: /boot/memtest86+_multiboot.bin
done
root@tirant:~# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-4.15.17-1-pve
root@tirant:~#

And actually install grub to the disk.

root@tirant:~# grub-install /dev/sda
Installing for i386-pc platform.
Installation finished. No error reported.
root@tirant:~#

If the server is booting via EFI, the output will be slightly different. We can also force it to install for the alternative platform using --target i386-pc or --target x86_64-efi, eg:

root@tirant:~# grub-install --target x86_64-efi --efi-directory /mnt/efi
Installing for x86_64-efi platform.
File descriptor 4 (/dev/sda2) leaked on vgs invocation. Parent PID 29184: grub-install
File descriptor 4 (/dev/sda2) leaked on vgs invocation. Parent PID 29184: grub-install
EFI variables are not supported on this system.
EFI variables are not supported on this system.
grub-install: error: efibootmgr failed to register the boot entry: No such file or directory.
root@tirant:~#

(/mnt/efi is /dev/sda2 mounted)

Now, clone the BIOS and EFI partitions from the old disk to the new one:

root@tirant:~# dd if=/dev/sda1 of=/dev/sdb1
2048+0 records in
2048+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0263653 s, 39.8 MB/s
root@tirant:~# dd if=/dev/sda2 of=/dev/sdb2
524288+0 records in
524288+0 records out
268435456 bytes (268 MB, 256 MiB) copied, 5.48104 s, 49.0 MB/s
root@tirant:~#

Finally, reboot and test, if everything has worked, the server should boot up as normal.

DNS Hosting - Part 3: Putting it all together

Post thumbnail

In my previous posts I discussed the history leading up to, and the eventual rewrite of my DNS hosting solution. So this post will (finally) talk briefly about how it all runs in production on MyDNSHost.

Shortly before the whole rewrite I’d found myself playing around a bit with Docker for another project, so I decided early on that I was going to make use of Docker for the main bulk of the setup to allow me to not need to worry about incompatibilities between different parts of the stack that needed different versions of things, and to update different bits at different times.

The system is split up into a number of containers (and could probably be split up into more).

To start with, I had the following containers:

  • API Container - Deals with all the backend interactions
  • WEB Container - Runs the main frontend that people see. Interacts with the API Container to actually do anything.
  • DB Container - Holds all the data used by the API
  • BIND Container - Runs an instance of bind to handle DNSSEC signing and distributing DNS Zones to the public-facing servers.
  • CRON Container - This container runs a bunch of maintenance scripts to keep things tidy and initiate DNSSEC signing etc.

The tasks in the CRON container could probably be split up more, but for now I’m ok with having them in 1 container.

This worked well, however I found some annoyances when redeploying the API or WEB containers causing me to be logged out from the frontend, so another container was soon added:

  • MEMCACHED Container - Stores session data from the API and FRONTEND containers to allow for horizontal scaling and restarting of containers.

In the first instance, the API Container was also responsible for interactions with the BIND container. It would generate zone files on-demand when users made changes, and then poke BIND to load them. However this was eventually split out further, and another 3 containers were added:

  • GEARMAN Container - Runs an instance of Gearman for the API container to push jobs to.
  • REDIS Container - Holds the job data for GEARMAN.
  • WORKER Container - Runs a bunch of worker scripts to do the tasks the API Container previously did for generating/updating zone files and pushing to BIND.

Splitting these tasks out into the WORKER container made the frontend feel faster as it no longer needed to wait for things to happen and could just fire the jobs off into GEARMAN and let it worry about them. I also get some extra logging from this as the scripts can be a lot more verbose. In addition, if a worker can’t handle a job it can be rescheduled to try again and the workers can (in theory) be scaled out horizontally a bit more if needed.

There was some initial challenges with this - the main one being around how the database interaction worked, as the workers would fail after periods of inactivity and then get auto restarted and work immediately. This turned out to be mainly due to how I’d pulled out the code from the API into the workers. Whereas the scripts in API run using the traditional method where the script gets called and does it’s thing (including setup) then dies, the WORKER scripts were long-term processes, so the DB connections were eventually timing out and the code was not designed to handle this.

Finally, more recently I added statistical information about domains and servers, which required another 2 containers:

  • INFLUXDB Container - Runs InfluxDB to store time-series data and provide a nice way to query it for graphing.
  • CHRONOGRAF Container - Runs Chronograf to allow me to easily pull out data from INFLUXDB for testing.

That’s quite a few containers to manage. To actually manage running them, I make use of Docker-Compose primarily (to set up the various networks, volumes, containers) etc. This works well for the most part, but there are a few limitations around how it deals with restarting containers that cause fairly substantial downtime with upgrading WEB or API. To get around this I wrote a small bit of orchestration scripting that uses docker-compose to scale the WEB and API containers up to 2 (Letting docker-compose do the actual creation of the new container), then manually kills off the older container and then scales them back down to 1. This seems to behave well.

So with all these containers hanging around, I needed a way to deal with exposing them to the web, and automating the process of ensuring they had SSL Certificates (using Let’s Encrypt). Fortunately Chris Smith has already solved this problem for the most part in a way that worked for what I needed. In a blog post he describes a set of docker containers he created that automatically runs nginx to proxy towards other internal containers and obtain appropriate SSL certificates using DNS challenges. For the most part all that was required was running this and adding some labels to my existing containers and that was that…

Except this didn’t quite work initially, as I couldn’t do the required DNS challenges unless I hosted my DNS somewhere else, so I ended up adding support for HTTP Challenges and then I was able to use this without needing to host DNS elsewhere. (And in return Chris has added support for using MyDNSHost for the DNS Challenges, so it’s a win-win). My orchestration script also handles setting up and running the automatic nginx proxy containers.

This brings me to the public-facing DNS Servers. These are currently the only bit not running in Docker (though they could). These run on some standard Ubuntu 16.04 VMs with a small setup script that installs bind and an extra service to handle automatically adding/removing zones based on a special “catalog zone” due to the versions of bind currently in use not yet supporting them natively. The transferring of zones between the frontend and the public servers is done using standard DNS Notify and AXFR. DNSSEC is handled by the backend server pre-signing the zones before sending them to the public servers, which never see the signing keys.

By splitting jobs up this way, in theory it should be possible in future (if needed) to move away from BIND to alternatives (such as PowerDNS or so).

As well as the public service that I’m running, all of the code involved (All the containers and all the Orchestration) is available on Github under the MIT License. Documentation is a little light (read: pretty non-existent) but it’s all there for anyone else to use/improve/etc.

DNS Hosting - Part 2: The rewrite

Post thumbnail

In my previous post about DNS Hosting I discussed the history leading up to when I decided I needed a better personal DNS hosting solution. I decided to code one myself to replace what I had been using previously.

I decided there was a few things that were needed:

  • Fully-Featured API
    • I wanted full control over the zone data programmatically, everything should be possible via the API.
    • The API should be fully documented.
  • Fully-Featured default web interface.
    • There should be a web interface that fully implements the API. Just because there is an API shouldn’t mean it has to be used to get full functionality.
    • There should exist nothing that only the default web ui can do that can’t be done via the API as well.
  • Multi-User support
    • I also host some DNS for people who aren’t me, they should be able to manage their own DNS.
  • Domains should be shareable between from users
    • Every user should be able to have their own account
    • User accounts should be able to be granted access to domains that they need to be able to access
      • Different users should have different access levels:
      • Some just need to see the zone data
      • Some need to be able to edit it
      • Some need to be able to grant other users access
  • Backend Agnostic
    • The authoritative data for the zones should be stored independently from the software used to serve it to allow changing it easily in future

These were the basic criteria and what I started off with when I designed MyDNSHost.

MyDNSHost Homepage

Now that I had the basic criteria, I started off by coming up with a basic database structure for storing the data that I thought would suit my plans, and a basic framework for the API backend so that I could start creating some initial API endpoints. With this in place I was able to create the database structure, and pre-seed it with some test data. This would allow me to test the API as I created it.

I use chrome, so for testing the API I use the Restlet Client extension.

Armed with a database structure, a basic API framework, and some test data - I was ready to code!

Except I wasn’t.

Before I could start properly coding the API I needed to think of what endpoints I wanted, and how the interactions would work. I wanted the API to make sense, so wanted to get this all planned first so that I knew what I was aiming for.

I decided pretty early on that I was going to version the API - that way if I messed it all up I could re do it and not need to worry about backwards compatability, so for the time being, everything would exist under the /1.0/ directory. I came up with the following basic idea for endpoints:

MyDNSHost LoggedIn Homepage
  • Domains
    • GET /domains - List domains the current user has access to
    • GET /domains/<domain> - Get information about
    • POST /domains/<domain> - Update domain
    • DELETE /domains/<domain> - Delete domain
    • GET /domains/<domain>/records - Get records for
    • POST /domains/<domain>/records - Update records for
    • DELETE /domains/<domain>/records - Delete records for
    • GET /domains/<domain>/records/<recordid> - Get specific record for
    • POST /domains/<domain>/records/<recordid> - Update specific record for
    • DELETE /domains/<domain>/records/<recordid> - Delete specific record for
  • Users
    • GET /users - Get a list of users (non-admin users should only see themselves)
    • GET /users/(<userid>|self) - Get information about a specific user (or the current user)
    • POST /users/(<userid>|self) - Update information about a specific user (or the current user)
    • DELETE /users/<userid> - Delete a specific user (or the current user)
    • GET /users/(<userid>|self)/domains - Get a list of domains for the given user (or the current user)
  • General
    • GET /ping - Check that the API is responding
    • GET /version - Get version info for the API
    • GET /userdata - Get information about the current login (user, access-level, etc)

This looked sane so I set about with the actual coding!

Rather than messing around with oauth tokens and the like I decided that every request to the API should be authenticated. Initially using basic-auth and username/password, but eventually also using API Keys, this made things fairly simple whilst testing, and made interacting with the API via scripts quite straight forward (no need to grab a token first and then do things).

The initial implementation of the API with domain/user editing functionality and API Key support was completed within a day, and then followed a week of evenings tweaking and adding functionality that would be needed later - such as internal “hook” points for when certain actions happened (changing records etc) so that I could add code to actually push these changes to a DNS Server. As I was developing the API, I also made sure to document it using API Blueprint and Aglio - it was easier to keep it up to date as I went, than to write it all after-the-fact.

Once I was happy with the basic API functionality and knew from my (manual) testing that it functioned as desired, I set about on the Web UI. I knew I was going to use Bootstrap for this because I am very much not a UI person and bootstrap helps make my stuff look less awful.

MyDNSHost Records View

Now, I should point out here, I’m not a developer for my day job, most of what I write I write for myself to “scratch an itch” so to speak. I don’t keep up with all the latest frameworks and best practices and all that. I only recently in the last year switched away from hand-managing project dependencies in Java to using gradle and letting it do it for me.

So for the web UI I decided to experiment and try and do things “properly”. I decided to use composer for dependency management for the first time and then used a 3rd party request-router Bramus/Router for handling how pages are loaded and used Twig for templating. (At this point, the API code was all hand-coded with no 3rd party dependencies. However my experiment with the front end was successful and the API Code has since changed to also make use of composer and some 3rd party dependencies for some functionality.)

The UI was much quicker to get to an initial usable state - as all the heavy lifting was already handled by the backend API code, the UI just had to display this nicely.

I then spent a few more evenings and weekends fleshing things out a bit more, and adding in things that I’d missed in my initial design and implementations. I also wrote some of the internal “hooks” that were needed to make the API able to interact with BIND and PowerDNS for actually serving DNS Data.

As this went on, whilst the API Layout I planned stayed mostly static except with a bunch more routes added, I did end up revisiting some of my initial decisions:

  • I moved from a level-based user-access to the system for separating users and admins, to an entirely role-based system.
    • Different users can be granted access to do different things (eg manage users, impersonate users, manage all domains, create domains, etc)
  • I made domains entirely user-agnostic
    • Initially each domain had an “owner” user, but this was changed so that ownership over a domain is treated the same as any other level of access on the domain.
    • This means that domains can technically be owned by multiple people (Though in normal practice an “owner” can’t add another user as an “owner” - only users with “Manage all domains” permission can add users at the “owner” level)
    • This also allows domain-level API Keys that can be used to only make changes to a certain domain not all domains a user has access to.

Eventually I had a UI and API system that seemed to do what I needed and I could look at actually putting this live and starting to use it (which I’ll talk about in the next post).

After the system went live I also added a few more features that were requested by users that weren’t part of my initial requirements, such as:

  • TOTP 2FA Support
    • With “remember this device” option rather than needing to enter a code every time you log in.
  • DNSSEC Support
  • EMAIL Notifications when certain important actions occur
    • User API Key added/changed
    • User 2FA Key added/changed
  • WebHooks when ever zone data is changed
  • Ability to use your own servers for hosting the zone data not mine
    • The live system automatically allows AXFR for a zone from any server listed as an NS on the domain and sends appropriate notifies.
  • Domain Statistics (such as queries per server, per record type, per domain etc)
  • IDN Support
  • Ability to import and export raw BIND data.
    • This makes it easier for people to move to/from the system without needing any interaction with admin users or needing to write any code to deal with zone files.
    • Ability to import Cloudflare-style zone exports.
      • These look like BIND zone files, but are slightly invalid, this lets users just import from cloudflare without needing to manually fix up the zones.
  • Support for “Exotic” record types: CAA, SSHFP, TLSA etc.
  • Don’t allow domains to be added to accounts if they are sub-domains of an already-known about domain.
    • As a result of this, also don’t allow people to add obviously-invalid domains or whole TLDs etc.

HUGO PPA

Post thumbnail

I run ubuntu on my servers, and since moving to Hugo, I wanted to make sure I was using the latest version available.

The ubuntu repos currently contain hugo version 0.15 in Xenial, and 0.25.1 in artful (And the next version, bionic only contains 0.26). The latest version of hugo (as of today) is currently 0.32.2 - so the main repos are quite a bit out of date.

So to work around this, I’ve setup an apt repo that tracks the latest release for hugo, which can be installed and used like so:

sudo wget http://packages.dataforce.org.uk/packages.dataforce.org.uk_hugo.list -O /etc/apt/sources.list.d/packages.dataforce.org.uk_hugo.list
wget -qO- http://packages.dataforce.org.uk/pubkey.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get install hugo

This repo tracks the latest hugo debs in all 4 of the architectures supported: amd64, i386, armhf and arm64 and should stay automatically up to date with the latest version.

DNS Hosting - Part 1: History

Post thumbnail

For as long as I can remember I’ve hosted my own DNS. Originally this was via cpanel on a single server that I owned and then after a while I moved to a new server away from cpanel and moved to doing everything myself (web, email, dns) - hand-editing zone files and serving them via BIND.

This was great for learning, and this worked well for a while but eventually I ended up with more servers, and a lot more domains. Manually editing DNS Zone files and reloading BIND was somewhat of a chore any time I was developing things or spinning up new services or moving things between servers - I wanted something web-based.

There wasn’t many free/cheap providers that did what I wanted (This was long before Cloudflare or Route 53), so around 2007 I did what any (in)sane person would do - I wrote a custom control panel for managing domains. and email. and it was multi-user. and it had billing support. and a half-baked ticket-system… ok, so I went a bit overboard with the plans for it. But mainly it controlled DNS and throughout it’s lifetime that was the only bit that was “completed” and fully functional.

The DNS editing was simple, it parsed BIND Zone files, presented them in a table of text input fields and let me make changes. (This is an ever-so-slight upgrade to “hand-edit zone files”)

For security reasons the webserver couldn’t write to the bind zone file directory, so it made use of temporary files that the webserver could write to and then a cron script made these temporary files live by moving them into the bind zone file directory and reloading the zone with rndc reload example.org. Reading of zone data in the editor would look for the zone in the temporary directory first before falling back to the bind directory so that any pending edits didn’t get lost before the cronjob ran.

After I had the editor working I wanted redundancy. I made use of my other servers and added secondary name servers that synced the zones from the master. There was a cronjob on the master server to build a list of valid zones, and separate cronjobs on the secondary servers that synced this list of zones every few hours. Zone data came in via AXFR from the master to the secondaries.

I even added an API of sorts for changing zone data. It wasn’t good ( GET requests to crafted URLs such as GET /api/userapi-key/dns/key=domain-api-key/type=A/setrecord=somerecord.domain.com/ttl=3600/data=1.1.1.1 or so) but it let me automate DNS changes which I used for automated website failover.

DNS Control Panel

This all kinda worked. There was a bunch of delays waiting for cronjobs to do things (Creating new zones needed a cronjob to run and this needed to run before zones could be edited (and before they existed on the secondary servers). Editing zones then needed to wait for a cronjob to make the changes live, etc) but ultimately it did what I needed, and the delays weren’t really a problem. The cronjobs on the master server ran every minute, and the secondary servers ran every 6 hours. Things worked, DNS got served, I could edit the records, job done?

A few years later (2010) I realised that the DNS editing part of the control panel was the only bit worth keeping, so I made plans to rip it out and make it a standalone separate service. The plan was to get rid of BIND and zone-file parsing, and move to PowerDNS which had a mySQL backend that I could just edit directly. The secondary servers would then run with the main server configured as a supermaster to remove the need for cronjobs to sync the list of zones.

So I bought a generic domain mydnshost.co.uk (You can never have too many domain names!) and changed all the nameservers at my domain registrar to point to this generic name… and then did absolutely nothing else with it (except for minor tweaks to the control panel to add new record types) for a further 7 years. Over the years I’d toy again with doing something, but this ultimately never panned out.

Whilst working on another project that was using letsencrypt for SSL certificates, I found myself needing to use dns-based challenges for the domain verification rather than http-based verification. At first I did this using my old DNS system. I was using dehydrated for getting the certificates so I was able to use a custom shell script that handled updating the DNS records (and sleeping for the appropriate length of time to ensure that the cronjob had run before letting the letsencrypt server check for the challenge) - but this felt dirty. I had to add in support for removing records to my API (in 10 years I’d never needed it as I always just changed where records pointed) and it just felt wrong. The old API wasn’t really very useful for anything other than very specific use cases and the code was nasty.

So, in March 2017 I finally decided to make use of the mydnshost.co.uk domain and actually build a proper DNS system which I (appropriately) called MyDNSHost - and I’ll go into more detail about that specifically in part 2 of this series of posts.

Moving to Hugo

Post thumbnail

For a while now I’ve been thinking of moving this blog to a statically generated site rather than using wordpress.

There are a number of reasons for this:

  1. I can version-control the content rather than relying on wordpress database backups.
  2. It renders quicker
  3. I don’t actually use any of the wordpress features, so it’s just bloat, and a potential security hole.

I’ve seen Chris successfully use Hugo for his site and it seems to do exactly what I want, So I took the opportunity during the Christmas break to spend some time converting my blog from Wordpress to Hugo.

Actually doing this was a multi-stage process.

Stage 1 - Exporting the old content

This was mostly achieved using the wordpress-to-hugo-exporter.

Once the content was exported, I then went through and removed any posts that were not marked as Published (This blog has technically been running for a long time under various guises. Some of the very old content is “my-first-blog” style garbage and over the years I’ve marked these as not-published within wordpress. I’ll keep hold of these posts separately from the main site git repository.).

Stage 2 - Converting the theme

The next stage was converting the old wordpress theme I was using. It took me a long time to decide on a theme that I even somewhat liked before, so for now I figured I’d keep the same theme.

Thankfully hugo themes are pretty straight forward and the conversion process was pretty painless once I got to grips with the hugo syntax.

Stage 3 - Fixing the exported content

The wordpress-to-hugo-exporter plugin did a pretty decent job of exporting the old content, but not perfect.

When exporting, the content gets run through the wordpress the_content filter so that 3rd party plugin get a chance to modify it. Sometimes the generated HTML confused the converter and resulted in sub-optimal (broken) markdown output.

Thankfully, I don’t have a huge amount of content, and hugo lets you run a debugging server using hugo server that automatically refreshes pages as you save them, this allowed me to fix the content and see the fixes in real time.

Stage 4 - Publishing

This was the easy stage. I created a new github repo that stores the site content.

This is then checked out on the webserver and hugo is run in the directory to generate the final content into the /public directory that is then served by my webserver.

To automate the deployments process, there is also a script in there (github-webhook-deploy.php) that I run on the server under a different domain that gets poked by github any time I push the repo, this script handles pulling the updated version of the site and running hugo on it.

Final Thoughts

  • Hugo’s theming is nice and simple so at some point I may redo the theme. The content is totally agnostic to the theme, so this should be pretty seamless.

  • The new blog no longer has comments. I don’t think this is a big loss, there are plenty of ways for people to contact me. I may look into something like disqus or so in future.

  • I don’t know if this will actually make me more likely to update the site or write more, but it’s worth a try.

  • Sorry to RSS Readers, the change is likely going to spam you about all the posts again.

It’s been a while…

It’s been a while since I updated this. Not through a lack of wanting to, more a combination of things - lack of time, lack of anything worth writing, hating the old blog theme…

So I’ve replaced the theme!

Maybe that’ll help.

It probably won’t.

Limiting the effectiveness of DNS Amplification

I recently had the misfortune of having a server I am responsible for used as a target for DNS Amplification, and thought I’d share how I countered this. (Whilst this was effective for me, your mileage may vary, but if this actually helps someone then it’s worth posting about.)

This particular server was the main recursor for the site that it was located at (And this was correctly limited not to allow open recursion), but was also authoritative for a small selection of domains. (Yes I know mixing recursors and resolvers is bad.)

The problem only came about when I needed to relocate the server to another site. In order to ensure continuity of service whilst the nameserver IP change propagated, I added some port-forwards at the old site that redirected DNS traffic to the new site. This however meant that all DNS traffic going towards the server came from an IP that was trusted for recursion. Oops.

After adding the port-forwards, but before updating the nameservers, I got distracted and ended up forgetting about this little hack, until the other day when I suddenly noticed that both sites were suffering due to large numbers of packets. (It’s worth noting, that in this case both sites were actually on standard ADSL connections, so not a whole lot of upload bandwidth available here!)

After using tcpdump it became apparent quite quickly what was going on, and it reminded me that I hadn’t actually made the nameserver change yet. This left me in a situation where the server was being abused, but I wasn’t in a position to just remove the port forward without causing a loss of service.

I was however able to add a selection of iptables rules to the firewall at the first site (that was doing the forwarding) in order to limit the effectiveness of the attack, which should be self explanatory (along with the comments):

# Create a chain to store block rules in
iptables -N BADDNS

# Match all "IN ANY" DNS Queries, and run them past the BADDNS chain.
iptables -A INPUT -p udp --dport 53 -m string --hex-string "|00 00 ff 00 01|" --to 255 --algo bm -m comment --comment "IN ANY?" -j BADDNS
iptables -A FORWARD -p udp --dport 53 -m string --hex-string "|00 00 ff 00 01|" --to 255 --algo bm -m comment --comment "IN ANY?" -j BADDNS

# Block domains that are being used for DNS Amplification...
iptables -A BADDNS -m string --hex-string "|04 72 69 70 65 03 6e 65 74 00|" --algo bm -j DROP --to 255 -m comment --comment "ripe.net"
iptables -A BADDNS -m string --hex-string "|03 69 73 63 03 6f 72 67 00|" --algo bm -j DROP --to 255 -m comment --comment "isc.org"
iptables -A BADDNS -m string --hex-string "|04 73 65 6d 61 02 63 7a 00|" --algo bm -j DROP --to 255 -m comment --comment "sema.cz"
iptables -A BADDNS -m string --hex-string "|09 68 69 7a 62 75 6c 6c 61 68 02 6d 65 00|" --algo bm -j DROP --to 255 -m comment --comment "hizbullah.me"

# Rate limit the rest.
iptables -A BADDNS -m recent --set --name DNSQF --rsource
iptables -A BADDNS -m recent --update --seconds 10 --hitcount 5 --name DNSQF --rsource -j DROP

This flat-out blocks the DNS queries that were being used for domains that I am not authoritative for, but I didn’t want to entirely block all “IN ANY” queries, so rate limits the rest of them. This was pretty effective at stopping the ongoing abuse.

It only works of course if the same set of IPs are repeatedly being targeted (remember, these are generally spoofed IPs that are actually the real target). Once the same target is spoofed enough times, it gets blocked and no more DNS packets will be sent to it, thus limiting the effectiveness of the attack (how much it limits it, depends on how much packets would otherwise have been aimed at the unsuspecting target).

Here is my iptables output as of right now, considering the counters were cleared Friday morning:

root@rakku:~ # iptables -vnx --list BADDNS
Chain BADDNS (2 references)
    pkts      bytes target     prot opt in     out     source               destination
  458939 29831035 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0           STRING match "|0472697065036e657400|" ALGO name bm TO 255 /* ripe.net */
 2215367 141783488 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0           STRING match "|0473656d6102637a00|" ALGO name bm TO 255 /* sema.cz */
       0        0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0           STRING match "|0968697a62756c6c6168026d6500|" ALGO name bm TO 255 /* hizbullah.me */
       1     2248 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0           STRING match "|03697363036f726700|" ALGO name bm TO 255 /* isc.org */
    5571   385042            all  --  *      *       0.0.0.0/0            0.0.0.0/0           recent: SET name: DNSQF side: source
    5542   374343 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0           recent: UPDATE seconds: 10 hit_count: 5 name: DNSQF side: source
root@rakku:~ #

Interestingly, the usual amplification target, isc.org, wasn’t really used this time.

As soon as the nameserver IP updated (seems the attackers were using DNS to find what server to attack), the packets started arriving directly at the new site and thus no longer matched the recursion-allowed subnets and the attack stopped being effective (and then eventually stopped altogether once I removed the port-forward which stopped the first site responding recursively also)

In my case I applied this where I was doing the forwarding, as the attack was only actually a problem if the query ended up at that site and to limit the outbound packets being forwarded, however this would work just fine if implemented directly on the server ultimately being attacked.

Website Reshuffle

Over the past few weekends I’ve been (slowly) working on moving my websites around a bit so that things are all in once place, and in the case of this blog, no longer hosted on my home ADSL connection.

At the moment all non-existent pages on http://home.dataforce.org.uk/, http://dataforce.org.uk/ and http://shanemcc.co.uk/ will now redirect to the equivalent link on http://blog.dataforce.org.uk/, over time I will work on moving all public content from these sites over to here (There isn’t much, they’ve mostly been used as dumping grounds!)

After this http://home.dataforce.org.uk/ will be primarily for private things and http://dataforce.org.uk/ and http://shanemcc.co.uk/ will simply redirect here. Eventually I may look to transition this blog over to one of the raw domains (probably dataforce.org.uk). Ultimately I’m trying to do this without breaking any links that may exist to files/etc on these domains.

So if stuff breaks over the next few weeks, that’s why. Feel free to leave a comment if you notice anything or something goes missing.