Containers With systemd-nspawn

I recently learned about systemd-nspawn which some pages claim to be similar in functionality to LXC with simpler setup as most of the pieces are already there in modern Linux distributions.

Since using LXD without snap has become cumbersome, I decided to give systemd-nspawn a try.

Setup

Start with a clean Ubuntu 20.04 server install. This seems to be running systemd-networkd by default.
apt-get install systemd-container
Do the key setup for nspawn.org

Get an image and start a container

Pull an image:

sudo machinectl pull-tar https://hub.nspawn.org/storage/ubuntu/focal/tar/image.tar.xz  ubuntu-20.04

start the vm:

machinectl start ubuntu-20.04

check the status:

$ machinectl list
MACHINE      CLASS     SERVICE        OS     VERSION ADDRESSES
ubuntu-20.04 container systemd-nspawn ubuntu 20.04   192.168.8.165…
# I hate how systemd commands eat output
$ machinectl list -l
MACHINE      CLASS     SERVICE        OS     VERSION ADDRESSES
ubuntu-20.04 container systemd-nspawn ubuntu 20.04   192.168.1.11
                                                     169.254.70.231
                                                     fe80::d891:ecff:fe00:6958

1 machines listed.

Doing things in the container

It can be accessed as a “regular” VM or machine via the console.

# Start an interactive (root) session
machinectl shell ubuntu-20.04
Connected to machine ubuntu-20.04. Press ^] three times within 1s to exit session.
root@focal:~# exit

Configuring the container

Running a container as per above using machinectl uses a set of default options which work well in complete isolation (one can shell to it and copy files and use the network from the container), but a container is more interesting if it can interact with host resources like files and being a network server. Most of this can be configured via the command line by using systemd-nspawn instead of machinectl. But it’s also possible to configure most options via nspawn unit files, which feel similar to LXD configuration profiles to an extent. So that’s the technique I’m going to use to customize most of the settings for my containers below.

Networking

If systemd-networkd is installed on both the host and the guest, virtualized networking gets configured automatically. This is an interesting advantage of using Ubuntu Server 20.04, as systemd-networkd is there by default.

The machine is visible from the host:

$ ping 192.168.8.165
PING 192.168.8.165 (192.168.8.165) 56(84) bytes of data.
64 bytes from 192.168.8.165: icmp_seq=1 ttl=64 time=0.130 ms
64 bytes from 192.168.8.165: icmp_seq=2 ttl=64 time=0.065 ms
64 bytes from 192.168.8.165: icmp_seq=3 ttl=64 time=0.095 ms

Make the machine visible to other hosts

I’m sure some portmapping trickery would work but since this is a container and should stand on its own, I wanted an alternative to lxd’s bridged networking where the container appears in the same network as the host, as a kind of “sibling”.

The thing to do is to create a bridge on the host system. This page has a good amount of detail on how to do that - in our example, it can be done in our ubuntu-server system by disabling cloud-init network config and creating a manual netplan config file:

# Do this as root, or use | sudo tee as appropriate
echo "network: {config: disabled}" > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
rm /etc/network/interfaces.d/50-cloud-init.yaml
# ens3 is the name of the network interface in the host, change accordingly.
cat << EOF > /etc/netplan/01-bridge.yaml
network:
  version: 2
  renderer: networkd
  ethernets:
    ens3:
      dhcp4: no
  bridges:
    br0:
      dhcp4: yes
      interfaces:
          - ens3
EOF
# Apply the configuration, or reboot
netplan apply

Once that’s done, configure the container to use the bridge (br0 as just configured):

sudo mkdir -p /etc/systemd/nspawn
cat << EOF | sudo tee /etc/systemd/nspawn/ubuntu-20.04.nspawn
[Network]
Bridge=br0
EOF

No virtual networking

In this mode the container uses the same network as the host (application container mode), it looks like a Docker container where all ports are “forwarded” to the host by default.

sudo mkdir -p /etc/systemd/nspawn
cat << EOF | sudo tee /etc/systemd/nspawn/ubuntu-20.04.nspawn
[Network]
VirtualEthernet=no
EOF

In the LXC world we had a very nice profile which enabled things such as:

- mapping your host user into the container with the same name
- Adding the user into the container with sudo permissions and no-password requirement
- installing your preferred shell
- Mapping your home directory into the container on the same location
- (Optionally mapping host directories into the container)

This can be achieved with systemd-nspawn by using the same user IDs in the container and the host system. For this, ensure the image is pristine and has never been started with systemd-nspawn or machinectl (without the proper setup, these commands will change file ownership in strange ways, see the --private-users parameter to systemd-nspawn). With the image ready, first set PrivateUsers to false in the nspawn file and configure the directories you want mounted:

cat << EOF | sudo tee /etc/systemd/nspawn/ubuntu-20.04.nspawn
[Exec]
PrivateUsers=false

[Files]
# Single-parameter binds this directory name in the host as the same name in the container
Bind=/src
# Two parameters are source-in-host:destination-in-container
Bind=/home/my/sources-dir:/src
EOF

Next, as the user you will typically use or want to own the bound directories (from host to container), do this to create a container user with the same name, ID and group:

# Setup group and user in the container matching your current user's info.
C_UID=$(id -u)
C_GID=$(id -g)
C_GROUP=$(id -gn)
C_USER=$(id -un)

# --console=pipe is needed so the command doesn't open a tty into the container,
# which in turn is needed so both these commands can be given one after the other (otherwise,
# the tty eats the second command)
sudo systemd-nspawn --console=pipe -D /var/lib/machines/ubuntu-20.04 groupadd -g $C_GID $C_GROUP
sudo systemd-nspawn --console=pipe -D /var/lib/machines/ubuntu-20.04 useradd -g $C_GROUP -G sudo -u $C_UID -s $SHELL -m $C_USER
# TODO: Setup sudo like this and install some key packages (sudo?)
%sudo ALL=(ALL) NOPASSWD:ALL

# TO DO: Replicate SSH key setup
# TO DO: Replicate bind mounting home?
# TO DO: Replicate adding specific mounts?
# TO DO: Make the config script idempotent?

Once the user is created and binds are configured, the container can be started with machinectl start ubuntu-20.04 and the Bind-provided directories should be in place and accessible.

If things are owned by nobody/nogroup in the container, it probably means the PrivateUsers option is set to non-false, this will perform username mapping and gets tricky if you want read-write access to files.

Keep in mind that using the same UID/GID namespace is somewhat insecure, so it’s best to use it only for workloads you mostly trust or for local development. systemd-nspawn has safer options using --private-users but those are mostly incompatible with writable Bind directories, so I won’t discuss them here. Also keep in mind that systemd-nspawn is pretty spartan in its user mapping and volume management/permissions capabilities, so if you need something more elaborate you will probably be better off using LXC instead.

Configuration

Most of the interesting config options supported by systemd-nspawn can be invoked more directly using that command, instead of machinectl; that said, most functionality is available via machinectl but it requires creating .nspawn systemd units as seen above.

Other references

I pieced together this tutorial from resources found in the following sites:

I also had to read the man pages for systemd-nspawn and systemd.nspawn extensively - the first one contains more detailed documentation on how each option works, when given as command-line parameters to systemd-nspawn itself, but I found it more comfortable to configure things in an .nspawn unit file as shown above, these are documented in systemd.nspawn and always correspond to systemd-nspawn command-line options, and doing it in the unit file allows using machinectl for most day-to-day operations (starting, stopping, creating, removing, shelling into the container).