Rebuilding Cricalix.Net – Part 4

While on holiday, I read a forum post that mentioned a “new” web server called Caddy. I took a look at it, and was intrigued by the integrated TLS certificate renewal using Let’s Encrypt. With NGINX or Apache, I have to run Certbot or similar to maintain the certificates, and I have to deal with permissions. Caddy offered a way to sidestep that management on at least one of the containers – just define the credentials in the Caddyfile directives, and Caddy takes care of writing the relevant TXT record to my DNS zone, and triggering a new certificate allocation from Let’s Encrypt.

So, out with NGINX, in with Caddy.

Caddy and .deb packages

The Caddy folks very nicely provide a .deb package of the webserver – it’s very basic, and pretty much only includes the binary and a systemd unit configuration, but that’s enough for me. The problem is that the .deb package doesn’t include any of the modules like DNS provider support. The documented way of doing this is using xcaddy and building a custom server, but that feels like too much management work for me. However, they also provide a binary download service that offers up a custom binary with chosen plugins.

My initial approach was to caddy via the .deb package to get the systemd support, promptly shut the service down (it doesn’t like the config file that was deployed before it was installed anyway), and use curl to fetch the custom binary from the Caddy download service. Once downloaded, the file is moved so that it clobbers the package-provided binary, execution bits are set, and the service is started. This means that any updates served via auto-updates are going to clobber my custom binary. I could just turn auto-updates off on the container instances, but I’d like to keep some packages updated (like Roundcube). Arguably, because I’m using containers with mounted block devices, I could just shut the www container down, delete it, and re-provision it every month – this would get me all the updates on a monthly basis. With a bit more thought, I could probably automate a new container every week, but that’s getting far too into the weeds for my personal server.

An alternative approach is to just take the systemd service units provided by the Caddy package, push them into my git repo, and just fetch the binary from the download service. Then I don’t have to worry about automatic updates breaking my webserver when I’m on holiday. So that’s what I did instead. Copied the contents of the service files into the repo, added the relevant metadata files to control the “deploy from git” process, and amended my caddysetup script to just grab the binary via curl, enable the service, and start it.

However, this didn’t work immediately, systemd couldn’t start the binary even though the same exec command worked from the command line. Instead, systemctl status caddy resulted in

Aug 27 08:14:09 www systemd[1]: Starting Caddy...
Aug 27 08:14:09 www systemd[147]: <strong>caddy.service: Failed to determine user credentials: No such process</strong>
Aug 27 08:14:09 www systemd[147]: <strong>caddy.service: Failed at step USER spawning /usr/bin/caddy: No such process</strong>
Aug 27 08:14:09 www systemd[1]: <strong>caddy.service: Main process exited, code=exited, status=217/USER</strong>
Aug 27 08:14:09 www systemd[1]: <strong>caddy.service: Failed with result 'exit-code'.</strong>
Aug 27 08:14:09 www systemd[1]: <strong>Failed to start Caddy.</strong>

The root cause is that the systemd spec tries to start the process as user caddy, group caddy, and I hadn’t created those. It would be nice if the systemd error was a bit clearer about that fact, instead of requiring an internet search.

A neat side-effect of the setup is that caddy is storing its runtime data (like certificates) in /var/www/.config, which means the data persists across instance rebuilds.

Caddy logs

Caddy logs with structured JSON. For those of us used to the Apache or NGINX worlds, this is slightly problematic as it’s hard to read a blob of json. jq helps with this, and some jq incantations make it possible to get an analog of the old common log format that Apache and NGINX use. Courtesy of https://caddy.community/t/making-caddy-logs-more-readable/7565/5

jq -j '.ts |= strftime("%Y-%m-%d %H:%M:%S") |  .ts, "|", .request.remote_ip,"|", .request.uri,"|", .request.method,"|", .request.proto,"|", .status,"|", .request.headers."User-Agent"[]+"\n"'

Not something I’d want to type every time, so a shell alias may well be needed (plus I think I’d use spaces on the pipes for easy of reading in a terminal).

LXD, dnsmasq, and cloud-init

While doing all of this mucking around with Caddy, I ran into a really weird case of the www instance not being able to talk to the squid instance intermittently. After a lot of muttering and reboots, I managed to narrow down a reproduction case with just two instances started from the default template, using the Ubuntu 22.04 cloud image. I filed an issue with the LXD folks about it, and they came back with “not our problem, it’s cloud-init”, but it’ll at least get documented.

If you’re curious, you can read the GitHub issue for LXD.

Copy files to the instances

There are some cases where I’ll need to import files from the old server to the instances running on the new one; www content and maildirs mostly.

In my test environment, I’m using dir storage, so the filesystems for the containers are exposed as directories on the host. A simple rsync + chmod is all that’s needed to get files from outside the container to inside the container. However, in the production instance, I’ll be using ZFS as the backing mechanism for the storage, so I need to find another way to do the copying.

One way would be to add private keys to the root account in the container, and do direct SSH pulls from the old server. This probably isn’t a terrible way to do it; I can just add the key manually, do the copies, and then wipe the key. With independent storage spaces, I can even just wipe the instance completely and rebuild it; the data will still be there.

DB service migration

This is the first service migration that has to happen, and I’ll need to do it one DB at a time (instead of dumping all DBs) because of some of the software upgrades that are happening as part of the server migration. Specifically, Postfixadmin and Nextcloud are getting major version changes, and the DDL will change. I could try doing incremental updates, but it’ll be nice to start from a clean slate.

Mail service migration

With the container approach, I’ve got postfix running in the mx container, and Dovecot running in the mail container. LMTP is used between the two for account verification and delivery (the VPS was postfix + courier + maildrop). To migrate cleanly, I’ll have to import all the virtual user accounts, and convert the passwords to SHA512-CRYPT storage. Then all of the aliases have to be brought across as well, but I can do that with an export/import type process built on shell scripting – things that I can do before even trying to make the new server the MX for the domains it handles.

Web service migration

This should be one of the easier bits of the migration, once the DBs for the WordPress instances are migrated. A lot of that work can be scripted, with the script kept in the git repo for posterity/the next time I do this.