DNS Hosting - Part 3: Putting it all together

Post thumbnail

In my previous posts I discussed the history leading up to, and the eventual rewrite of my DNS hosting solution. So this post will (finally) talk briefly about how it all runs in production on MyDNSHost.

Shortly before the whole rewrite I’d found myself playing around a bit with Docker for another project, so I decided early on that I was going to make use of Docker for the main bulk of the setup to allow me to not need to worry about incompatibilities between different parts of the stack that needed different versions of things, and to update different bits at different times.

The system is split up into a number of containers (and could probably be split up into more).

To start with, I had the following containers:

  • API Container - Deals with all the backend interactions
  • WEB Container - Runs the main frontend that people see. Interacts with the API Container to actually do anything.
  • DB Container - Holds all the data used by the API
  • BIND Container - Runs an instance of bind to handle DNSSEC signing and distributing DNS Zones to the public-facing servers.
  • CRON Container - This container runs a bunch of maintenance scripts to keep things tidy and initiate DNSSEC signing etc.

The tasks in the CRON container could probably be split up more, but for now I’m ok with having them in 1 container.

This worked well, however I found some annoyances when redeploying the API or WEB containers causing me to be logged out from the frontend, so another container was soon added:

  • MEMCACHED Container - Stores session data from the API and FRONTEND containers to allow for horizontal scaling and restarting of containers.

In the first instance, the API Container was also responsible for interactions with the BIND container. It would generate zone files on-demand when users made changes, and then poke BIND to load them. However this was eventually split out further, and another 3 containers were added:

  • GEARMAN Container - Runs an instance of Gearman for the API container to push jobs to.
  • REDIS Container - Holds the job data for GEARMAN.
  • WORKER Container - Runs a bunch of worker scripts to do the tasks the API Container previously did for generating/updating zone files and pushing to BIND.

Splitting these tasks out into the WORKER container made the frontend feel faster as it no longer needed to wait for things to happen and could just fire the jobs off into GEARMAN and let it worry about them. I also get some extra logging from this as the scripts can be a lot more verbose. In addition, if a worker can’t handle a job it can be rescheduled to try again and the workers can (in theory) be scaled out horizontally a bit more if needed.

There was some initial challenges with this - the main one being around how the database interaction worked, as the workers would fail after periods of inactivity and then get auto restarted and work immediately. This turned out to be mainly due to how I’d pulled out the code from the API into the workers. Whereas the scripts in API run using the traditional method where the script gets called and does it’s thing (including setup) then dies, the WORKER scripts were long-term processes, so the DB connections were eventually timing out and the code was not designed to handle this.

Finally, more recently I added statistical information about domains and servers, which required another 2 containers:

  • INFLUXDB Container - Runs InfluxDB to store time-series data and provide a nice way to query it for graphing.
  • CHRONOGRAF Container - Runs Chronograf to allow me to easily pull out data from INFLUXDB for testing.

That’s quite a few containers to manage. To actually manage running them, I make use of Docker-Compose primarily (to set up the various networks, volumes, containers) etc. This works well for the most part, but there are a few limitations around how it deals with restarting containers that cause fairly substantial downtime with upgrading WEB or API. To get around this I wrote a small bit of orchestration scripting that uses docker-compose to scale the WEB and API containers up to 2 (Letting docker-compose do the actual creation of the new container), then manually kills off the older container and then scales them back down to 1. This seems to behave well.

So with all these containers hanging around, I needed a way to deal with exposing them to the web, and automating the process of ensuring they had SSL Certificates (using Let’s Encrypt). Fortunately Chris Smith has already solved this problem for the most part in a way that worked for what I needed. In a blog post he describes a set of docker containers he created that automatically runs nginx to proxy towards other internal containers and obtain appropriate SSL certificates using DNS challenges. For the most part all that was required was running this and adding some labels to my existing containers and that was that…

Except this didn’t quite work initially, as I couldn’t do the required DNS challenges unless I hosted my DNS somewhere else, so I ended up adding support for HTTP Challenges and then I was able to use this without needing to host DNS elsewhere. (And in return Chris has added support for using MyDNSHost for the DNS Challenges, so it’s a win-win). My orchestration script also handles setting up and running the automatic nginx proxy containers.

This brings me to the public-facing DNS Servers. These are currently the only bit not running in Docker (though they could). These run on some standard Ubuntu 16.04 VMs with a small setup script that installs bind and an extra service to handle automatically adding/removing zones based on a special “catalog zone” due to the versions of bind currently in use not yet supporting them natively. The transferring of zones between the frontend and the public servers is done using standard DNS Notify and AXFR. DNSSEC is handled by the backend server pre-signing the zones before sending them to the public servers, which never see the signing keys.

By splitting jobs up this way, in theory it should be possible in future (if needed) to move away from BIND to alternatives (such as PowerDNS or so).

As well as the public service that I’m running, all of the code involved (All the containers and all the Orchestration) is available on Github under the MIT License. Documentation is a little light (read: pretty non-existent) but it’s all there for anyone else to use/improve/etc.

DNS Hosting - Part 2: The rewrite

Post thumbnail

In my previous post about DNS Hosting I discussed the history leading up to when I decided I needed a better personal DNS hosting solution. I decided to code one myself to replace what I had been using previously.

I decided there was a few things that were needed:

  • Fully-Featured API
    • I wanted full control over the zone data programmatically, everything should be possible via the API.
    • The API should be fully documented.
  • Fully-Featured default web interface.
    • There should be a web interface that fully implements the API. Just because there is an API shouldn’t mean it has to be used to get full functionality.
    • There should exist nothing that only the default web ui can do that can’t be done via the API as well.
  • Multi-User support
    • I also host some DNS for people who aren’t me, they should be able to manage their own DNS.
  • Domains should be shareable between from users
    • Every user should be able to have their own account
    • User accounts should be able to be granted access to domains that they need to be able to access
      • Different users should have different access levels:
      • Some just need to see the zone data
      • Some need to be able to edit it
      • Some need to be able to grant other users access
  • Backend Agnostic
    • The authoritative data for the zones should be stored independently from the software used to serve it to allow changing it easily in future

These were the basic criteria and what I started off with when I designed MyDNSHost.

MyDNSHost Homepage

Now that I had the basic criteria, I started off by coming up with a basic database structure for storing the data that I thought would suit my plans, and a basic framework for the API backend so that I could start creating some initial API endpoints. With this in place I was able to create the database structure, and pre-seed it with some test data. This would allow me to test the API as I created it.

I use chrome, so for testing the API I use the Restlet Client extension.

Armed with a database structure, a basic API framework, and some test data - I was ready to code!

Except I wasn’t.

Before I could start properly coding the API I needed to think of what endpoints I wanted, and how the interactions would work. I wanted the API to make sense, so wanted to get this all planned first so that I knew what I was aiming for.

I decided pretty early on that I was going to version the API - that way if I messed it all up I could re do it and not need to worry about backwards compatability, so for the time being, everything would exist under the /1.0/ directory. I came up with the following basic idea for endpoints:

MyDNSHost LoggedIn Homepage
  • Domains
    • GET /domains - List domains the current user has access to
    • GET /domains/<domain> - Get information about
    • POST /domains/<domain> - Update domain
    • DELETE /domains/<domain> - Delete domain
    • GET /domains/<domain>/records - Get records for
    • POST /domains/<domain>/records - Update records for
    • DELETE /domains/<domain>/records - Delete records for
    • GET /domains/<domain>/records/<recordid> - Get specific record for
    • POST /domains/<domain>/records/<recordid> - Update specific record for
    • DELETE /domains/<domain>/records/<recordid> - Delete specific record for
  • Users
    • GET /users - Get a list of users (non-admin users should only see themselves)
    • GET /users/(<userid>|self) - Get information about a specific user (or the current user)
    • POST /users/(<userid>|self) - Update information about a specific user (or the current user)
    • DELETE /users/<userid> - Delete a specific user (or the current user)
    • GET /users/(<userid>|self)/domains - Get a list of domains for the given user (or the current user)
  • General
    • GET /ping - Check that the API is responding
    • GET /version - Get version info for the API
    • GET /userdata - Get information about the current login (user, access-level, etc)

This looked sane so I set about with the actual coding!

Rather than messing around with oauth tokens and the like I decided that every request to the API should be authenticated. Initially using basic-auth and username/password, but eventually also using API Keys, this made things fairly simple whilst testing, and made interacting with the API via scripts quite straight forward (no need to grab a token first and then do things).

The initial implementation of the API with domain/user editing functionality and API Key support was completed within a day, and then followed a week of evenings tweaking and adding functionality that would be needed later - such as internal “hook” points for when certain actions happened (changing records etc) so that I could add code to actually push these changes to a DNS Server. As I was developing the API, I also made sure to document it using API Blueprint and Aglio - it was easier to keep it up to date as I went, than to write it all after-the-fact.

Once I was happy with the basic API functionality and knew from my (manual) testing that it functioned as desired, I set about on the Web UI. I knew I was going to use Bootstrap for this because I am very much not a UI person and bootstrap helps make my stuff look less awful.

MyDNSHost Records View

Now, I should point out here, I’m not a developer for my day job, most of what I write I write for myself to “scratch an itch” so to speak. I don’t keep up with all the latest frameworks and best practices and all that. I only recently in the last year switched away from hand-managing project dependencies in Java to using gradle and letting it do it for me.

So for the web UI I decided to experiment and try and do things “properly”. I decided to use composer for dependency management for the first time and then used a 3rd party request-router Bramus/Router for handling how pages are loaded and used Twig for templating. (At this point, the API code was all hand-coded with no 3rd party dependencies. However my experiment with the front end was successful and the API Code has since changed to also make use of composer and some 3rd party dependencies for some functionality.)

The UI was much quicker to get to an initial usable state - as all the heavy lifting was already handled by the backend API code, the UI just had to display this nicely.

I then spent a few more evenings and weekends fleshing things out a bit more, and adding in things that I’d missed in my initial design and implementations. I also wrote some of the internal “hooks” that were needed to make the API able to interact with BIND and PowerDNS for actually serving DNS Data.

As this went on, whilst the API Layout I planned stayed mostly static except with a bunch more routes added, I did end up revisiting some of my initial decisions:

  • I moved from a level-based user-access to the system for separating users and admins, to an entirely role-based system.
    • Different users can be granted access to do different things (eg manage users, impersonate users, manage all domains, create domains, etc)
  • I made domains entirely user-agnostic
    • Initially each domain had an “owner” user, but this was changed so that ownership over a domain is treated the same as any other level of access on the domain.
    • This means that domains can technically be owned by multiple people (Though in normal practice an “owner” can’t add another user as an “owner” - only users with “Manage all domains” permission can add users at the “owner” level)
    • This also allows domain-level API Keys that can be used to only make changes to a certain domain not all domains a user has access to.

Eventually I had a UI and API system that seemed to do what I needed and I could look at actually putting this live and starting to use it (which I’ll talk about in the next post).

After the system went live I also added a few more features that were requested by users that weren’t part of my initial requirements, such as:

  • TOTP 2FA Support
    • With “remember this device” option rather than needing to enter a code every time you log in.
  • DNSSEC Support
  • EMAIL Notifications when certain important actions occur
    • User API Key added/changed
    • User 2FA Key added/changed
  • WebHooks when ever zone data is changed
  • Ability to use your own servers for hosting the zone data not mine
    • The live system automatically allows AXFR for a zone from any server listed as an NS on the domain and sends appropriate notifies.
  • Domain Statistics (such as queries per server, per record type, per domain etc)
  • IDN Support
  • Ability to import and export raw BIND data.
    • This makes it easier for people to move to/from the system without needing any interaction with admin users or needing to write any code to deal with zone files.
    • Ability to import Cloudflare-style zone exports.
      • These look like BIND zone files, but are slightly invalid, this lets users just import from cloudflare without needing to manually fix up the zones.
  • Support for “Exotic” record types: CAA, SSHFP, TLSA etc.
  • Don’t allow domains to be added to accounts if they are sub-domains of an already-known about domain.
    • As a result of this, also don’t allow people to add obviously-invalid domains or whole TLDs etc.

GMail – apply labels to email from group members – Redux

A while ago I posted a python script that allowed automatically adding labels to GMail messages based on contact groups.

Unfortunately, a side effect of this script was that Google occasionally would lock an account out for “suspicious activity”, and for this reason I stopped using the script.

However recently I looked at Google Apps Scripts to see if this would allow me to recreate this using Google-Approved APIs, and the good news is, yes it does.

The following script implements the same behaviour as the old python script. It checks every thread from the past 2 dates (so today, and yesterday) and then for each message in the thread gets the list of groups the sender is in (if the sender is a contact, and in any groups) and then checks to see if there are labels that match the same name, if so it applies them to the message.

To get this running, create a new project on the Google Apps script page, then paste the code in.

Modify scheduledProcessInbox and processInboxAll to include a label prefix if desired (eg contacts/) and then enable the desired schedule (click on the clock icon in the toolbar). Once this has been scheduled you can run an initial pass over the inbox using processInboxAll() - however this is limited to the last 500 threads.

The code can now be found here on github

Any questions/comments/bugs please leave them here or on github.

IPv6 with Endian Community Firewall (EFW) 2.4.0

First post in over a year! Oops.

For a while now, my home ADSL provider (EntaNET) has provided me with an IPv6 allocation, but I’ve never really used it (Its been on my to-do list for some time) primarily due to the fact that it is unsupported by Endian which I use for my home router/firewall.

However the other day after being asked about IPv6 at my day job, I decided I wanted to get this working, and decided to document it here in case it can assist anyone else in future. (I also finally got round to completing the Hurricane Electric IPv6 Certification up to sage level)

There’s a few things worth noting before we continue here.

  1. I use a Draytek Vigor 120 for my adsl modem - this is a PPPoA to PPPoE bridge. This means that my Endian box uses PPPoE to get its Internet connection, and directly receives an IPv4 address via the PPP session. There is no “PPP Half-Bridge” tricks here (such as where Modem does authentication, then DHCPs the address to Endian).
  2. Due to Endian lacking support for IPv6 you will need to use SSH to configure this, and any Endian upgrades will probably reverse a fair chunk of it. (Also, some reconfigurations may also undo things) - so with this in mind the rest of this guide assumes you are familiar with SSH and have successfully logged in as root to the Endian box (SSH can be enabled under the “System” section and “SSH Access”).
  3. Due to previous requirements, my Endian server is not “pure” in that I have additional packages installed that made this easier. Notably, a complete build environment. This won’t be needed here.
  4. This was all done without writing it down, so this documentation is based on my recollection and attempts at replicating various parts on a VirtualBox VM (which can’t do PPPoE…). If I’ve missed anything, please let me know in the comments.
  5. This was done with EFW 2.4.0 and may not work in the latest 2.5.1 version.
  6. I have only had this running for a few days, so there may be some unforeseen issues with this.

With this in mind, we continue to the actual important stuff!

The way EntaNET do IPv6, with a default setup you will get an IP Address allocated over PPP that is in a /64, but you also get a /56 which is routed to you. We will use a /64 from the /56 as the address for the LAN.

For the purposes of this, we are going to assume the following:

  • 2001:DB8:4D51:AA00::/56 - /56 range allocated to us by the ISP
  • 2001:DB8:4D51:AAFF::/64 - /64 range we are going to use internally.
  • 2001:DB8:4D51:FFFF::/64 - /64 range advertised across the PPP session.

The first thing to do, is to have Endian actually ask for IPv6 from the upstream provider at PPP time. This is easy:

echo "+ipv6" >> /etc/ppp/peers/defaults/pppd-pppoe

Assuming that the ADSL provider and modem both support IPv6, and you have been assigned an allocation you will see an IPv6 address attached to ppp0 once your session is active. This is from the PPP /64 and is not part of your /56 allocation.

So now we know that IPv6 works we can disconnect the PPP session.

Now, we actually want to be able to do something with our allocation, so we will want to announce it to our network.

For this, we will need radvd which will send the required RA Packets out to the network. As Endian 2.4.0 is built on Fedora Core 3, we can use the existing package for this, these can currently be found here

Unfortunately, Endian doesn’t quite provide a complete environment, so we will need to force the install to ignore dependencies (specifically, chkconfig and /sbin/service are missing).

rpm --nodeps -Uvh http://archives.fedoraproject.org/pub/archive/fedora/linux/core/3/i386/os/Fedora/RPMS/radvd-0.7.2-9.i386.rpm

We can now configure this to announce our prefix by editing /etc/radvd.conf to something like this:

interface br0
{
        AdvSendAdvert on;
        MinRtrAdvInterval 3;
        MaxRtrAdvInterval 10;
        AdvHomeAgentFlag off;
        prefix 2001:DB8:4D51:AAFF::/64
        {
                AdvOnLink on;
                AdvAutonomous on;
                AdvRouterAddr off;
        };
};

To trick radvd into starting, we also need to create a dummy file that exists in real RedHat-esque distros that Endian doesn’t provide:

echo "NETWORKING_IPV6=yes" >> /etc/sysconfig/network

We also need to enable IPv6 forwarding:

sysctl net.ipv6.conf.all.forwarding=1

and we should now be able to start radvd:

/etc/init.d/radvd start

Now, if we bring the ppp connection back up, you’ll notice that the ppp0 interface no longer gets allocated a routable IPv6 address from the PPP /64. This is because with ipv6 forwarding turned on, this host is now acting as an ipv6 router, and ipv6 routers ignore RA packets.

This isn’t a problem.

At this point, your LAN boxes will have IPv6 addresses, but the LAN boxes won’t be able to communicate with the internet yet.

To fix this, we need to tell the Endian box how to route traffic, specifically to both our LAN, and the default route:

route --inet6 add 2001:DB8:4D51:AAFF::/64 dev br0
route --inet6 add default dev ppp0

With this however, the Endian box won’t have IPv6 connectivity, if this is something that is required, we can do something like this instead:

ip -6 addr add 2001:DB8:4D51:AAFF::/64 dev br0
route --inet6 add default dev ppp0

But remember, that any time Endian makes any changes to the network configuration, this will be lost.

Endian’s version of iputils is missing ping6 and traceroute6, but we can install these as follows:

cd /
curl http://archives.fedoraproject.org/pub/archive/fedora/linux/core/3/i386/os/Fedora/RPMS/iputils-20020927-16.i386.rpm > iputils.rpm
rpm2cpio iputils.rpm | cpio -ivd '*6'
rm iputils.rpm

This will give you ping6 etc to allow you to verify everything so far.

The next thing to do then is firewalling, this is done with ip6tables, which again Endian doesn’t have, however we can install ip6tables using the iptables-ipv6 package available in the RPM repo above)

rpm -Uvh http://archives.fedoraproject.org/pub/archive/fedora/linux/core/3/i386/os/Fedora/RPMS/iptables-ipv6-1.2.11-3.1.i386.rpm

Now you’ll be able to create firewall rules for your IPv6 connectivity. Its worth noting though that this version of ip6tables doesn’t support some modules (comment and state that I’ve seen so far). If you want these modules, then you’ll need to compile a newer version of iptables. (I’ve got a follow up post with a guide for this.)

To support this, I wrote a set of scripts for parsing “formatted-english” rules files into iptables rules, so lets install that and configure some rules.

cd /root
wget https://github.com/ShaneMcC/Firewall-Rules/zipball/master -O fwrules.zip
unzip fwrules.zip
mv ShaneMcC-Firewall-Rules-* fwrules
cp fwrules/example.rules fwrules/rules.rules
chmod a+x fwrules/run.sh

Looking at fwrules/rules.rules should give you a good guide on how the rules work, and you can edit these to your needs.

Once you are happy, the rules can be installed by running:

./fwrules/run.sh

The last thing then is to make this all work automatically.

In theory we should be able to just drop some files into the subfolders of /etc/uplinksdaemon, or into ifup.d or ifdown.d folders inside mkdir /var/efw/uplinks/main/ but neither of these approaches works. So instead we will make a minor modification to /usr/lib/uplinks/generic/hookery.sh and then have a script of our own do it.

Firstly, the minor change and an empty file:

sed -ri 's#^(.*log_done "Notify uplinks.*)$#\1\n    /sbin/uplinkchanged.sh "$@" >/dev/null 2>&1#' /usr/lib/uplinks/generic/hookery.sh
touch /sbin/uplinkchanged.sh
chmod a+x /sbin/uplinkchanged.sh

Now we can put the following into /sbin/uplinkchanged.sh:

#!/bin/bash

OURPREFIX="2001:DB8:4D51:AAFF::/64"

UPLINK=${1}
STATUS=${2}

if [ "${STATUS}" = "" ]; then
	exit 0;
fi;

PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin

if [ "${UPLINK}" = "main" ]; then
	if [ "${STATUS}" = "OK" ]; then
		sysctl net.ipv6.conf.all.forwarding=1
		/etc/init.d/radvd start
		route --inet6 add ${OURPREFIX} dev br0 2>&1
		route --inet6 add default dev ppp0
		/root/fwrules/run.sh
	elif [ "${STATUS}" = "FAILED" ]; then
		route --inet6 del default dev ppp0

		ip6tables -F
		ip6tables -X
		ip6tables -P INPUT ACCEPT
		ip6tables -P OUTPUT ACCEPT
		ip6tables -P FORWARD ACCEPT
	fi;
fi;

Now when the state of the main uplink changes, the relevant ipv6-related commands will be run to ensure that connectivity remains.

And that’s it, you should now have native IPv6 connectivity combined with Endian. Feel free to leave any comments you have regarding this.

Ident Server

I recently encountered a problem on a server that I manage where by the oidentd server didn’t seem to be working.

Manual tests worked, but connecting to IRC Servers didn’t.

I tried switching oidentd with ident2 and the same problem.

After switching back, and a bit of debugging later it appeared that the problem was that the IRC Servers were expecting spaces in the ident reply, whereas oidentd wasn’t giving them.

I then quickly threw together an xinet.d-powered ident server with support for spoofing.

First the xinet.d config:

service ident
{
	disable = no
	socket_type = stream
	protocol = tcp
	wait = no
	user = root
	server = /root/identServer.php
	nice = 10
}

Unfortunately yes, this does need to run as root otherwise it is unable to see what process is listening on a socket. In future I plan to change it to allow it to run without needing to be root (by using sudo for the netstat part)

Now for the code itself:

Edit: The code is now available on github: ShaneMcC/phpident

I welcome any comments about this, or any improvements and hope that it will be useful for someone else.

GitWeb Hacking.

Recently I setup gitweb on one of my servers to allow a web-based frontend to any git projects which the users of the server place in their ~/git/ directory.

After playing about with it, I noticed that it allowed for placing a README.html file in the git config directory to allow extra info to be shown on the summary view, managed to get it to pull the README.html file from the actual repository itself, and not the config directory, thus allowing the README.html to be versioned along with everything else, and not require the user to edit it on the server, but rather just edit it locally and push it.

This is a simple change in /usr/lib/cgi-bin/gitweb.cgi:

From (line 3916 or so):

	if (-s "$projectroot/$project/README.html") {
		if (open my $fd, "$projectroot/$project/README.html") {
			print "<div class=\"title\">readme</div>\n" .
			      "<div class=\"readme\">\n";
			print $_ while (<$fd>);
			print "\n</div>\n"; # class="readme"
			close $fd;
		}
	}

To:

if (my $readme_base = $hash_base || git_get_head_hash($project)) {
		if (my $readme_hash = git_get_hash_by_path($readme_base, "README.html", "blob")) {
			if (open my $fd, "-|", git_cmd(), "cat-file", "blob", $readme_hash) {
				print "<div class=\"title\">readme</div>\n";
				print "<div class=\"readme\">\n";

				print <$fd>;
				close $fd;
				print "\n</div>\n";
			}
		}
	}

I also added a second slightly hack that uses google’s code prettyfier when displaying a file, and makes the line numbers separate from the code so they don’t copy also when you copy the code,

From (line 2476 or so):

print "</head>\n" .
              "<body>\n";

To:

print qq(<link href="http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.css" type="text/css" rel="stylesheet" />\n);
        print qq(<script src="http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.js" type="text/javascript"></script>\n);

        print "</head>\n" .
              "<body onload=\"prettyPrint()\">\n";

and

From (line 4351 or so):

while (my $line = <$fd>) {
		chomp $line;
		$nr++;
		$line = untabify($line);
		printf "<div class=\"pre\"><a id=\"l%i\" href=\"#l%i\" class=\"linenr\">%4i</a> %s</div>\n",
		       $nr, $nr, $nr, esc_html($line, -nbsp=>1);
	}

To:

print "<table><tr><td class=\"numbers\"><pre>";
	while (my $line = <$fd>) {
		chomp $line;
		$nr++;
		printf "<a id=\"l%i\" href=\"#l%i\" class=\"linenr\">%4i</a>\n", $nr, $nr, $nr;
	}
	print "</pre></td>";
	open my $fd2, "-|", git_cmd(), "cat-file", "blob", $hash;
	print "<td class=\"lines\"><pre class=\"prettyprints\">";
	while (my $line = <$fd2>) {
		chomp $line;
		$line = untabify($line);
		printf "%s\n", esc_html($line, -nbsp=>1)
	}
	print "</pre></td></tr></table>";
	close $fd2;

This could do with a quick clean up (reuse $fd rather than opening $fd2) but it works.

New Phone – T-Mobile G1.

Recently I acquired a T-Mobile G1 to replace my old T-Mobile MDA Vario 2 (HTC Hermes).

All I can say about this phone is that it is quite awesome. I no longer need to run an exchange server to keep my contacts/calendar synced somewhere as the G1 syncs everything to Google Mail/Calendar.

Its a really good phone and I recommend it to anyone who is thinking of getting a new phone, the integration with Google is especially useful, and the full-html (including CSS and javascript) is very nice.

JDesktopPane Replacement

As as I mentioned before I’ve been recently converting an old project to Java.

This old project was an MDI application, and when creating the UI for the conversion, I found the default JDesktopPane to be rather crappy. Google revealed others thought the same, one of the results that turned up was: http://www.javaworld.com/javaworld/jw-05-2001/jw-0525-mdi.html

So, I created DFDesktopPane based on this code, with some extra changes:

  • Frames can’t end up with a negative x/y
  • Respond to resize events of the JViewport parent
  • Iconified icons move themselves to remain inside the desktop at all times.
  • Handles maximised frames correctly (desktop doesn’t scroll, option to hide/remove titlebar)

My modified JDesktopPane can be found as here part of my dflibs Google code project.

Other useful things can be found here, take a look and leave any feedback either here or on the project issue tracker

GMail – apply labels to email from group members

NOTE: The information in this article has been superceeded by this one.


As Noted by Chris recently on IRC, Google Mail lacks a feature in its ability to automatically label/filter messages - you can’t do it based on emails from people in a contact group, short of adding a filter with all their email address on it.

At the time it was mentioned this didn’t affect me, however later when I got round to adding loads of labels/filters in gmail (yay for, nicely coloured inbox!) to nicely separate things for me I also ran into this problem, so came up with the following python script that does it for me.

It checks messages, sees if the sender is in the contacts, then checks each group to see if there is a label with that group name that is not already set, then checks to see if the contact is in the group, and finally sets the label if everything matches up.

I ran it initially to tag my entire inbox (set checkAllIndex to True change ga.getMessagesByFolder(folderName) to ga.getMessagesByFolder(folderName, True)) and now have it running on a 15 minute cron (not using loopMode) to tag new messages for me.

Hopefully this will be useful to someone else, I’m not sure how well it works in general, it worked fine for me with ~700 messages at first, however after a few runs (due to regrouping some contacts) I was greeted by an Account Lockdown: Unusual Activity Detected message when trying to do anything - This went away after about 20 minutes, but don’t say you wern’t warned if it happens to you.

#!/usr/bin/env python
"""
 This script will login to gmail, and add labels to messages for contact groups.

 By default the script will only check items from the past 2 days where email
 was received.

 Loop mode can be enabled to save logging in repeatedly from cron.
 Loop mode may fail after some time if Google kills the session, or gmail
 becomes unavailable or so. (Untested in these situations). On the other hand
 it may also just keep running indefinitely as if no problem occurred, loop mode
 is relatively untested and was added as an after thought.

 When running in loop mode, it is best to have a crontab entry also that checks
 and restarts the script if it dies.

 Copyright 2008 Shane 'Dataforce' Mc Cormack

 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:

 The above copyright notice and this permission notice shall be included in
 all copies or substantial portions of the Software.

 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
"""

# Uncomment the lines below if python can't find libgmail on its own, and edit
# the sys,path.insert to point to where libgmail.py is.

# import sys
# sys.path.insert(0, 'libgmail')
import libgmail
import time

###############################################################################
# Configuration
###############################################################################

# Email Address
email = "YOUR EMAIL HERE"
# Password
password = "YOUR PASS HERE"
# Check all on index, rather than just the first 2 dates found
checkAllIndex = False
# Use Loop (if true the script will keep looping, and sleep between checking
# for new mail to modify)
useLoop = False
# Time in seconds to sleep when looping (300 = 5 mins)
loopTime = 300
# Label Prefix - if group-based labeles are prefixed, set the prefix here.
# (eg "Groups/")
labelPrefix = ""
# What folder to check? ('inbox' or 'all' are probbaly the most common settings)
folderName = 'inbox'

###############################################################################
# Helper classes/methods
###############################################################################

class ContactGroup:
	def __init__(self, id, name, contacts):
		self.id = id
		self.name = name
		self.contacts = contacts

	def containsContact(self, contact):
		for knownContact in self.contacts:
			if knownContact[0] == contact.id:
				return True
		return False

		def __str__(self):
			return self.name

# Get Contacts and Groups
# Modified from libgmail 0.1.10 to include groups aswell
def getContacts(account):
	"""
	Returns a GmailContactList object
	that has all the contacts in it as
	GmailContacts
	"""
	contactList = []
	groupList = []
	# pnl = a is necessary to get *all* contacts
	myUrl = libgmail._buildURL(view='cl',search='contacts', pnl='a')
	myData = account._parsePage(myUrl)
	# This comes back with a dictionary
	# with entry 'cl'
	addresses = myData['cl']

	# Now loop through the addresses and get the contacts
	for entry in addresses:
		if len(entry) &gt;= 6 and entry[0]=='ce':
			newGmailContact = libgmail.GmailContact(entry[1], entry[2], entry[4], entry[5])
			contactList.append(newGmailContact)

	contacts = libgmail.GmailContactList(contactList)

	# And now, the groups
	for entry in addresses:
		if entry[0]=='cle':
			newGroup = ContactGroup(entry[1], entry[2], entry[5])
			groupList.append(newGroup)

	return contacts, groupList

###############################################################################
# Setup
###############################################################################

print "Running.."
print "Use Loop:", useLoop
if useLoop:
	print "  Loop Time:", loopTime
print "Check all on index:", checkAllIndex
print "Label Prefix:", labelPrefix
print "Checking Folder:", folderName
print "libgmail Version:", libgmail.Version
print ""

# Login to gmail
print "Logging in as", email
ga = libgmail.GmailAccount(email, password)
ga.login()

# Loop at least once.
loop = True;

while loop:
	loop = useLoop

	print "Getting label names.."
	# Get Labels
	labels = ga.getLabelNames(refresh=True)
	# Get Messages
	print "Getting messages.."
	inbox = ga.getMessagesByFolder(folderName)
	# Get Contacts
	print "Getting contacts and groups"
	contacts, groups = getContacts(ga)

	# Check each thread in the inbox
	lastDate = '';
	secondDate = False;
	for thread in inbox:
		# Only check dates we are supposed to.
		if not checkAllIndex:
			# Get the date
			threadDate = thread.__getattr__('date');
			# Make sure a date is set
			if lastDate == '':
				lastDate = threadDate

			# If this date is different to the last one do something.
			if lastDate != threadDate:
				# If we are already on the second date, then we stop now
				if secondDate:
					break;
				# Otherwise, if the new data is a non-time date, we can change to the
				# second date.
				elif "am" not in threadDate and "pm" not in threadDate:
					lastDate = threadDate
					secondDate = True

		print "Thread:", thread.id, len(thread), thread.subject, thread.getLabels(), thread.__getattr__('date'), thread._authors, thread.__getattr__('unread')
		try:
			# Current Labels
			threadCurrentLabels = thread.getLabels();
			# We will add labels here first to prevent dupes
			threadLabels = set([])
			# Check each message in the thread.
			for msg in thread:
				print "  Message:", msg.id, msg.sender
				# Check if sender is a known  contact
				contact = contacts.getContactByEmail(msg.sender)
				if contact != False:
					# Check each group for this contact
					for group in groups:
						# If we have a label with this group name
						labelName = labelPrefix+group.name
						if (labelName in labels) and (labelName not in threadCurrentLabels):
							# And the group contains the contact we want
							if group.containsContact(contact):
								# Add it to the list
								print "    Sender Label:", labelName
								threadLabels.add(labelName)
		except Exception, detail:
			print "  Error parsing messages:", type(detail), detail

		# Now add the labels
		for label in threadLabels:
			print "  Adding Label:", label
			thread.addLabel(label)
		# If thread was unread, make it unread again.
		if thread.__getattr__('unread'):
			print "  Remarking as unread"
			ga._doThreadAction("ur", thread)

	if loop:
		print ""
		print "Sleeping"
		time.sleep(loopTime)
	else:
		print "Done"

On a related note, I’ve also recently started to use the “Better Gmail 2” addon for firefox (Official page seems down atm, but more info here) mostly for the grouping of labels feature.

Edit: Script will now preserve unread status of threads.

MD5

I was recently looking at converting an old application from VB6 to Java that used MD5 in its output files as hashes for validation.

The first thing I did was to make a java class that read in the file and checked the hashes, I tried it on a few files and it worked fine, then I found a file that it failed on.

Now, this app wrote all the files using the exact same function, so it seemed odd that 1 of them wouldn’t parse and the rest would.

When I looked at the file closer, I found that this one contained some symbols in the output that the others didn’t - I eventually figured out that the symbol that was causing the problem was the pound sign (£).

Without going into too much detail, this presented a major problem, the string in question was used as part of the password validation for the app (the output files are encrypted using the password as a key), and the java code was getting different results than the old VB6 code, and was unable to decode the file as a result.

So, this sparked my curiosity a bit, the VB6 code I was using wasn’t a built in, it was code I’d gotten elsewhere and used, so I assumed it was faulty code (not that this helped me much, as I needed to get the exact same output, but ignoring that).

I edited the initial form of my application to return the MD5 String for &pound; on its own, and got: d527ca074d412d9d0ffc844872c4603c

I did the same for my java code and got: 6465dad1d31752be3f3283e8f70feef7

So now all I needed to do was to see which was right, so I made a quick PHP script, and did the same and got: d99731d14c7750048538404febb0e357 … Yet another different hash!?

Ok, I thought, md5sum will help me figure out which one is right. one echo '&pound;' | md5sum - and I had 67160ce935d7cb5339047b12ad4611cb. Yes, that is correct, a 4th different hash.

So here I was with 4 different hashes and no idea which one was correct.

So after a bit of googling, I discovered that the MD5 RFC (1321) had the source code for a test application in it.

So I extracted the code from the Appendix of http://www.ietf.org/rfc/rfc1321.txt and tried to compile it with gcc md5.c mddriver.c -o mddriver only to discover that it failed to compile with lots of errors, fortunately this was an easy fix, near the top of mddriver.c, change #define MD MD5 to #define MD 5 and then it compiles without problem.

So, I ran ./mddriver -s&pound; and got the output MD5 ("&pound;") = d99731d14c7750048538404febb0e357 which agreed with what the PHP md5() function gave.

(Its worth noting that echo '&pound;' | ./mddriver agreed with md5sum, which made me remember that echo appends a \n, which was why I got a different output, running echo -n '&pound;' | md5sum gives the correct result, and would have saved me googling and finding the test suite!)

I tested a few other things and got the following results:

        mddriver: d99731d14c7750048538404febb0e357
             PHP: d99731d14c7750048538404febb0e357
           mySQL: d99731d14c7750048538404febb0e357
          python: d99731d14c7750048538404febb0e357
      postgreSQL: d99731d14c7750048538404febb0e357
          md5sum: d99731d14c7750048538404febb0e357

      JavaScript: d527ca074d412d9d0ffc844872c4603c
     VisualBasic: d527ca074d412d9d0ffc844872c4603c
         Eggdrop: d527ca074d412d9d0ffc844872c4603c
   Java (custom): d527ca074d412d9d0ffc844872c4603c

 Java (built in): 6465dad1d31752be3f3283e8f70feef7

There is also a list of MD5 implementations at http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html


The differences are primarily due to character encoding in the different languages. (In the case of my app, there was also a flaw in the implementation for strings where (length % 64) is > than 55 as well)

Example:

[07:14:55] [shane@Xion:~]$ php -r 'echo md5(utf8_encode("£"))."\n";'
2ccf59396b3c0958eec4ba721e2d083f
[07:15:01] [shane@Xion:~]$ php -r 'echo md5("£")."\n";'
d99731d14c7750048538404febb0e357
Java: System.out.println((int)'£'); => "163"
PHP: echo ord('£'); => "194"
PHP: echo ord(utf8_encode('£')); => "195"