Containers using glibc v2.34+ don't work on some older docker versions
After updating a docker image to use Ubuntu 22.04 as base (from 20.04), error messages started appearing when running containers, looking like this:
error message in jemalloc <jemalloc>: arena 0 background thread creation failed (1)
The docker image uses the jemalloc allocator, but as will be shown, it has nothing to do with this problem. This was happening only in certain environments - not in the Azure Kubernetes Service (AKS) and not on all hosts running regular
docker. It seemed to apply to specifically some hosts running older versions of
docker. The problem turned out to be the following:
jemalloc error message is a result of jemalloc failing to create a background thread.
jemalloc uses the pthread_create function of glibc which in turn uses a syscall called clone to create child processes.
Starting from the linux kernel version 5.3, released on September 15th 2019, a new version of the clone syscall was introduced, clone3. The new syscall provides a superset of the functionality of the older
clone interface and a number of API improvements.
glibc started using this new syscall optionally in its implementation of
pthread_create in version 2.34. Importantly, Ubuntu 22.04 uses glibc 2.35, whereas Ubuntu 20.04 used glibc 2.31. This problem would happen on Ubuntu 21.10 as well, as it uses glibc 2.34 (Ubuntu 21.04 uses glibc 2.33). What it boils down to is that in Ubuntu 21.10 and 22.04, glibc tries to use the
glibc implements this syscall in a wrapper which, if it returns with an ENOSYS (Function not implemented) error code, falls back to the older
clone implementation. This is where the other part of the story comes in.
docker uses seccomp to disallow certain syscalls from being executed in the container. This filtering can be configured using seccomp security profiles (or bypassed entirely by running containers in privileged mode).
However, when the syscall filter mechanism in docker encountered a new syscall (
clone3) it returned an EPERM (Operation not permitted) error code back instead of ENOSYS. This meant that glibc doesn’t fall back to the older implementation and the syscall fails.
runc added a “special handling for seccomp profiles to avoid making new syscalls unusable for glibc” already in v1.0.0-rc93 (this PR), but the fix is more of a workaround than a proper fix and seems to not always work.
runc is the CLI tool used for spawning containers, which is used by
containerd handles the lifecycle, networking and other aspects of containers, and is in turn used by
docker itself (i recommend looking at The differences between Docker, containerd, CRI-O and runc to understand how things hang together.)
runc uses the libseccomp library for the actual seccomp implementation (which is again a syscall).
The support for
clone3 in the seccomp profile was added in docker-ce 20.10.10. If your docker is at least this version, this problem shouldn’t happen. However, if your docker is older than this, you might have this problem, but not necessarily. This is where the difference between
docker-ce comes in.
docker.io is the older package used for distributing docker, and is maintained by Debian. It can be installed from the usual package repositories in Ubuntu/Debian.
docker-ce is the Community Edition of docker, distribution by Docker (the company). For tradeoffs between these two, see this discussion.
docker.io package used by Ubuntu contains a patch that fixes this issue even for docker versions older than 20.10.10 (tested with v20.10.7). This patch and the general mess of versioning and backporting all the components inside docker, is what makes this problem tricky to figure out.
In summary, make sure you are using docker v20.10.10 if using
docker-ce or a patched older version if using
docker.io when running images with glibc v2.34+.
Some other relevant discussions on github: