Deploying software with Docker containers
Fabian Gringel
Here I will a give a brief introduction to using Docker containers for software deployment. I assume that we want to deploy a Python application myapp
that has a web API.
Overview
A Docker container is essentially a virtual machine intended to run a single application. They provide a way to isolate the runtime dependencies of the app from the host system level and therefore simplify deployments. Unlike conventional virtual machines, Docker containers run directly on the host kernel, relying on Linux' cgroups to provide process isolation. This has a couple of important implications:
Containers have much less overhead compared to normal virtual machines (VMs), running essentially as fast as native processes (on Linux).
Docker can only really run on Linux since it relies on specific kernel features. There is Docker for Windows and macOS, but there, Docker runs on top of a Linux VM, with the performance penalties that entails.
Containers are less isolated from the host system than VMs and they can not be relied upon for security.
To run a Docker container, we need an image as a starting point, which packages the base distribution as well as all dependencies we need.
Docker provides a declarative way to specify and build such images using Dockerfiles, which are similar to shell scripts or make files, although with a few important caveats, as we will see below.
Permissions
The Docker daemon requires either root permissions, or that the user is a member of the docker
group, which is essentially equivalent to root and should be avoided. Non-root invocations will fail with the following, somewhat opaque, error message:
docker ps
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/json": dial unix /var/run/docker.sock: connect: permission denied
Writing Dockerfiles
A Dockerfile
consists of a set of instructions to setup the environment followed by the the command we want Docker to run at startup.
On our local machines, we would do the following:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Start the server on localhost:8080
myapp api --host 0.0.0.0 --port 8080 --storage ./storage
We want to create a Docker image which gives us essentially the same results.
A first attempt
Translating the above shell script into Docker instructions is mostly straightforward:
FROM python:3.7
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["myapp", "api", "--host", "0.0.0.0", "--port", "8080"]
The steps are almost the same:
We start from the base image
python:3.7
, which is provided by Docker. This is basically a Debian image with a global installation of Python 3.7.We create and change into the
/app
directory. The following commands are interpreted relative to/app
.We copy our code (and everything else in our build directory, see below) into the image.
We install our dependencies as above, using the
RUN
command. Note that we don’t need to use a virtual environment since we are already using Docker (but see below for a reason we might still want to use one).We specify the command Docker is supposed to run when starting the container with
CMD
.
Avoiding common pitfalls
As often happens with Docker, the above straightforward approach has quite a few caveats.
The base image
python:3.7
is quite large (~870Mb) and contains a lot of stuff we might not need. We can usepython:3.7-slim
instead, which is much smaller (~155Mb).The above approach does not take advantage of the layer cache mechanism. Since we run
COPY . .
before the time-consumingRUN pip install ...
, the cache gets invalidated after even minor code changes.Our app runs as the root user inside the container, which is a security liability (although quite common).
The following Dockerfile
is quite a bit closer to following best practices.
FROM python:3.7-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
RUN useradd appuser -m
USER appuser
CMD ["myapp", "api", "--host", "0.0.0.0", "--port", "8080"]
This is quite a bit messier, but addresses the above points:
We use
python:3.7-slim
as our base image.We split off the requirements file into a separate
COPY
instruction and only copy the rest after installing the dependencies. Now we can freely change the code, and rebuilding the image will be almost instant (as long as we don’t touch the requirements).We are creating a non-root
appuser
and start the app with it.
The layer system and cache
Docker uses a union filesystem to build the image in layers. Each step in the Dockerfile
creates a new layer, which
computes the results of the step,
computes a hash of it,
stores them in the build cache,
applies the changes on top of the previous steps.
This allows Docker to avoid repeating computations and makes the image storage more efficient. If a layer and all its predecessors were cached, than Docker will use the cache and the corresponding step in the Dockerfile
can be performed much faster.
The hashes are computed as follows:
For
COPY
andADD
instructions, the hashes are computed from the copied files. Therefore, any changes to these files invalidate the cache.For all other instructions, the hashes are computed from the commands in the
Dockerfile
.
Building the image
We can now build our improved image. This might take some minutes the first time it is run if there are a lot of dependencies. But subsequent builds should only take a second (until we change the requirements again).
sudo docker build -f Dockerfile -t myapp-api:slim .
Note that the .
at the end is the docker build
context and should refer to the project root.
docker build context
The Docker context consists of all the files which might be copied into the image with a COPY
instruction. By default, these are all the files in the build directory or all its subdirectory, which is usually quite large.
This slows down docker build
, since the context gets computed every time, even if we only copy specific files. Additionally, using COPY . .
as we did above, leads to a lot of unnecessary and possibly sensitive files being included in the image.
To avoid a large build context, we can white- or blacklist certain files or directories in the .dockerignore
file. This is analogous to .gitignore
, although they are implemented differently and we can not just use a single file for both.
Running the container
We can execute the command specified in the Dockerfile
as follows:
docker run -p 8080:8080 --name myapp-container -d myapp-api:slim
The command line options do the following:
The
-d
flag starts the container in the background.The
--name
option gives the container a name to make it easier to refer to it later.The option
-p 8080:8080
forwards the port 8080 inside the container to our local network
We can check if our container is running with docker ps
:
sudo docker ps
Debugging a container
We can execute other commands inside a running container:
sudo docker exec myapp-container whoami
appuser
This is very useful for debugging. We can e.g. check if our tests run for the image build:
sudo docker exec myapp-container pytest /app/tests/api -p "no:cacheprovider"
We can also login to the container and execute any commands there (but only as the appuser
). This needs the -i
(interactive) and -t
(tty) switches:
sudo docker exec -it myapp-container bash
Conclusion
I hope the above walkthrough helps you to get started. Of course I could only cover Docker's core functionality here.
Be aware that depending on your app's specifications and the security standard your app should meet, adding a non-root user might not suffice.