Updated Theme

Post thumbnail

Once again, I have replaced the theme of this blog.

Unlike the previous theme, this one is actually one I mostly ended up designing myself rather than just finding one that I mostly liked and running with it, and given that, I figured I’d talk a little bit about the thoughts behind it and how it came to be and what else I’ve done behind the scenes.

Posted on February 8, 2022 General

Docker Swarm Cluster Improvements

Post thumbnail

This post is part of a series.

  1. Docker Swarm with Ceph for cross-server files
  2. Upgrading Ceph in Docker Swarm
  3. Docker Swarm Cluster Improvements (This Post)

Since my previous posts about running docker-swarm with ceph, I’ve been using this fairly extensively in production and made some changes to the setup that follows on from the previous posts.

Fun with Dell S4048 and ONIE

Post thumbnail

In $DayJob we make use of Dell S4048-ON Switches for 10G Top-of-Rack (ToR) switching and also sometimes 10G Aggregation/Core for smaller deployments. They’re fairly flexible devices with a high number of 10G ports, some 40Gs and they can do L3 ports and L2 ports. You can also run them either Stacked or in VLT mode for redundancy purposes.

In addition these things use ONIE (Open Network Install Environment) and can run different firmware images - though we almost exclusively run these with DNOS 9 which is the Force10 FTOS code that Dell acquired some time ago rather than DNOS 10.

One evening, I was tasked with an “emergency” build request. We had some kit being shipped to a remote PoP the following day and the intended routers were delayed, so we needed to get something quickly and temporarily in place to take a BGP Transit Feed and deliver VRRP to the rest of the kit. A spare S4048 we had lying around would do the job sufficiently for the time period needed. I figured it wouldn’t take too long to get the base config needed and get it ready to be shipped with the rest of the kit.

So I got the Datacenter to rack/cable/console it so that I could begin configuration then set aside some time in the evening to do the work.

As I was watching the switch boot up I noticed something odd. Turns out the last engineer who had used this device had chosen to install the OpenSwitch OPX ONIE firmware on it instead of the usual DNOS9 firmware. So much for my quick and easy config.

At this point, I could have just reloaded the device into the ONIE installer environment and installed DNOS9 and been done with it all. But, I had a fairly open evening, and I’d not yet really played about much with any of the alternative ONIE OSes, so armed with my Yak Sheers, I thought I’d have a look around.

(After all this, I then re-imaged the device onto our standard deployment image of DNOS9 and completed the required config work that I was supposed to be doing.)

Cisco XConnect L2Protocol Handling

Post thumbnail

In $DayJob we make fairly extensive use of MPLS ATOM Pseudowires (XConnects) between our various datacenter locations to enable services in different sites to talk to each other at layer2.

The way I describe this to customers is that in essence these act as a “long cable” from Point-A to Point-B. The customer gets a cable at each side to connect to their kit, but in the middle of it there is magic that routes the packets over our network rather than an actual long-cable. Packets that enter 1 side will be pushed out the other side, and vice-versa. We don’t need to know or care what these packets are, we are just transparently transporting them.

Upgrading Ceph in Docker Swarm

Post thumbnail

This post is part of a series.

  1. Docker Swarm with Ceph for cross-server files
  2. Upgrading Ceph in Docker Swarm (This Post)
  3. Docker Swarm Cluster Improvements

This post is a followup to an earlier blog bost regarding setting up a docker-swarm cluster with ceph.

I’ve been running this cluster for a while now quite happily however since setting it up, a new version of ceph has been released - nautilus - so now it’s time for some upgrades.

Note: This post is out of date now.

I would suggest looking at this post and using the docker-compose based upgrade workflow instead, up to the housekeeping part.

I’ve mostly followed https://docs.ceph.com/docs/master/releases/nautilus/#upgrading-from-mimic-or-luminous but adapted it for the fact we’re running everything in docker. I recommend that you have a read though this yourself first to have an idea of what we are doing and why.

(It’s worth noting at this point that this guide was mostly written after the fact based on command history so I may have missed something. It’s always a good idea to do this on a test cluster first, or in a maintenance window!)

Fun with TOTP Codes

Post thumbnail

This all started with a comment I overheard at work from a colleague talking about a 2FA implementation on a service they were using.

“It works fine on everything except Google Authenticator on iPhone.”

… What? This comment alone immediately piqued my interest, I stopped what I was doing, turned round, and asked him to explain.

He explained that a service he was using provided 2FA support using TOTP codes. As is normal, they provided a QR Code, you scanned it with your TOTP application (Google Authenticator or Authy or so), then you typed in the verification code - and it worked for both Google Authenticator and Authy on his Android phone, but only with Authy and not Google Authenticator on another colleagues iPhone.

This totally nerd sniped me, and I just had to take a look.

Posted on March 29, 2019 Code

Docker Swarm with Ceph for cross-server files

Post thumbnail

This post is part of a series.

  1. Docker Swarm with Ceph for cross-server files (This Post)
  2. Upgrading Ceph in Docker Swarm
  3. Docker Swarm Cluster Improvements

I’ve been wanting to play with Docker Swarm for a while now for hosting containers, and finally sat down this weekend to do it.

Something that has always stopped me before now was that I wanted to have some kind of cross-site storage but I don’t have any kind of SAN storage available to me just standalone hosts. I’ve been able to work around this using ceph on the nodes.

Note: I’ve never used ceph before, I don’t really know what I’m doing with ceph, so this is all a bit of guesswork. I used Funky Penguin’s Geek Cookbook as a basis for some of this, though some things have changed since then, and I’m using base-centOS not AtomicHost (I tried AtomicHost, but wanted a newer-version of docker so switched away).

All my physical servers run Proxmox, and this is no exception. On 3 of these host nodes I created a new VM (1 per node) to be part of the cluster. These all have 3 disks, 1 for the base OS, 1 for Ceph, 1 for cloud-init (The non-cloud-init disks are all SCSI with individual iothreads).

Advent of Code Benchmarking

Post thumbnail

For a few years now I’ve been enjoying Eric Wastl’s Advent of Code. For those unaware, each year since 2015 Advent of Code provides a 2-part coding challenge every day from December 1st to December 25th.

In previous years, Myself and Chris have been fairly informally trying to see who was able to produce the fastest code (Me in PHP, Chris in Python). In the final week of last year to assist with this, we both made our repos run in Docker and produce time output for each day.

This allowed us to run each other’s code locally to compare fairly without needing to install the other’s dev environment, and made the testing a bit fairer as it was no longer dependant on who had the faster CPU when running their own solution. For the rest of the year this was fine and we carried on as normal. As we got to the end I remarked it would be fun to have a web interface that automatically dealt with it and showed us the scores, but there was obviously no point in doing that once the year was over. Maybe in a future year…

Fast forward to this year. Myself and Chris (and ChrisN) coded up our Day 1 solutions as normal and then some other friends started doing it for the first time. I remembered my plans from the previous year and suggested everyone should also docker-ify their repos… and so they agreed

mdadm RAID with Proxmox

Post thumbnail

I recently acquired a new server with 2 drives that I intended to use as RAID1 for a virtualisation host for various things.

My hypervisor of choice is Proxmox (For a few reasons, Support for KVM and LXC primarily, but the fact it’s debian based is a nice bonus, and I really dislike the occasionally-braindead networking implementation from vmware which rules out ESXi)

This particular server does not have a RAID card, so I needed to use a software raid implementation. Out of the box for RAID1 on Proxmox you need to use ZFS, however To keep this box similar to others I have I wanted to use ext4 and mdadm. So we’re going have to do a bit of manual poking to get this how we need it.

This post is mostly an aide-memoire for myself for the future.

DNS Hosting - Part 3: Putting it all together

Post thumbnail

This post is part of a series.

  1. DNS Hosting - Part 1: History
  2. DNS Hosting - Part 2: The rewrite
  3. DNS Hosting - Part 3: Putting it all together (This Post)

In my previous posts I discussed the history leading up to, and the eventual rewrite of my DNS hosting solution. So this post will (finally) talk briefly about how it all runs in production on MyDNSHost.

Shortly before the whole rewrite I’d found myself playing around a bit with Docker for another project, so I decided early on that I was going to make use of Docker for the main bulk of the setup to allow me to not need to worry about incompatibilities between different parts of the stack that needed different versions of things, and to update different bits at different times.

The system is split up into a number of containers (and could probably be split up into more).