Here I will a give a brief introduction to using Docker containers for software deployment. I assume that we want to deploy a Python application
myapp that has a web API.
A Docker container is essentially a virtual machine intended to run a single application. They provide a way to isolate the runtime dependencies of the app from the host system level and therefore simplify deployments. Unlike conventional virtual machines, Docker containers run directly on the host kernel, relying on Linux' cgroups to provide process isolation. This has a couple of important implications:
Containers have much less overhead compared to normal virtual machines (VMs), running essentially as fast as native processes (on Linux).
Docker can only really run on Linux since it relies on specific kernel features. There is Docker for Windows and macOS, but there, Docker runs on top of a Linux VM, with the performance penalties that entails.
Containers are less isolated from the host system than VMs and they can not be relied upon for security.
To run a Docker container, we need an image as a starting point, which packages the base distribution as well as all dependencies we need.
Docker provides a declarative way to specify and build such images using Dockerfiles, which are similar to shell scripts or make files, although with a few important caveats, as we will see below.
The Docker daemon requires either root permissions, or that the user is a member of the
docker group, which is essentially equivalent to root and should be avoided. Non-root invocations will fail with the following, somewhat opaque, error message:
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/json": dial unix /var/run/docker.sock: connect: permission denied
Dockerfile consists of a set of instructions to setup the environment followed by the the command we want Docker to run at startup.
On our local machines, we would do the following:
python -m venv venv source venv/bin/activate pip install -r requirements.txt # Start the server on localhost:8080 myapp api --host 0.0.0.0 --port 8080 --storage ./storage
We want to create a Docker image which gives us essentially the same results.
A first attempt
Translating the above shell script into Docker instructions is mostly straightforward:
FROM python:3.7 WORKDIR /app COPY . . RUN pip install -r requirements.txt CMD ["myapp", "api", "--host", "0.0.0.0", "--port", "8080"]
The steps are almost the same:
We start from the base image
python:3.7, which is provided by Docker. This is basically a Debian image with a global installation of Python 3.7.
We create and change into the
/appdirectory. The following commands are interpreted relative to
We copy our code (and everything else in our build directory, see below) into the image.
We install our dependencies as above, using the
RUNcommand. Note that we don’t need to use a virtual environment since we are already using Docker (but see below for a reason we might still want to use one).
We specify the command Docker is supposed to run when starting the container with
Avoiding common pitfalls
As often happens with Docker, the above straightforward approach has quite a few caveats.
The base image
python:3.7is quite large (~870Mb) and contains a lot of stuff we might not need. We can use
python:3.7-sliminstead, which is much smaller (~155Mb).
The above approach does not take advantage of the layer cache mechanism. Since we run
COPY . .before the time-consuming
RUN pip install ..., the cache gets invalidated after even minor code changes.
Our app runs as the root user inside the container, which is a security liability (although quite common).
Dockerfile is quite a bit closer to following best practices.
FROM python:3.7-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . RUN useradd appuser -m USER appuser CMD ["myapp", "api", "--host", "0.0.0.0", "--port", "8080"]
This is quite a bit messier, but adresses the above points:
python:3.7-slimas our base image.
We split off the requirements file into a separate
COPYinstruction and only copy the rest after installing the dependencies. Now we can freely change the code, and rebuilding the image will be almost instant (as long as we don’t touch the requirements).
We are creating a non-root
appuserand start the app with it.
The layer system and cache
Docker uses a union filesystem to build the image in layers. Each step in the
Dockerfile creates a new layer, which
computes the results of the step,
computes a hash of it,
stores them in the build cache,
applies the changes on top of the previous steps.
This allows Docker to avoid repeating computations and makes the image storage more efficient. If a layer and all its predecessors were cached, than Docker will use the cache and the corresponding step in the
Dockerfile can be performed much faster.
The hashes are computed as follows:
ADDinstructions, the hashes are computed from the copied files. Therefore, any changes to these files invalidate the cache.
For all other instructions, the hashes are computed from the commands in the
Building the image
We can now build our improved image. This might take some minutes the first time it is run if there are a lot of dependencies. But subsequent builds should only take a second (until we change the requirements again).
sudo docker build -f Dockerfile -t myapp-api:slim .
Note that the
. at the end is the
docker build context and should refer to the project root.
docker build context
The Docker context consists of all the files which might be copied into the image with a
COPY instruction. By default, these are all the files in the build directory or all its subdirectory, which is usually quite large.
This slows down
docker build, since the context gets computed every time, even if we only copy specific files. Additionally, using
COPY . . as we did above, leads to a lot of unnecessary and possibly sensitive files being included in the image.
To avoid a large build context, we can white- or blacklist certain files or directories in the
.dockerignore file. This is analogous to
.gitignore, although they are implemented differently and we can not just use a single file for both.
Running the container
We can execute the command specified in the
Dockerfile as follows:
docker run -p 8080:8080 --name myapp-container -d myapp-api:slim
The command line options do the following:
-dflag starts the container in the background.
--nameoption gives the container a name to make it easier to refer to it later.
-p 8080:8080forwards the port 8080 inside the container to our local network
We can check if our container is running with
sudo docker ps
Debugging a container
We can execute other commands inside a running container:
sudo docker exec myapp-container whoami
This is very useful for debugging. We can e.g. check if our tests run for the image build:
sudo docker exec myapp-container pytest /app/tests/api -p "no:cacheprovider"
We can also login to the container and execute any commands there (but only as the
appuser). This needs the
-i (interactive) and
-t (tty) switches:
sudo docker exec -it myapp-container bash
I hope the above walkthrough helps you to get started. Of course I could only cover Docker's core functionality here.
Be aware that depending on your app's specifications and the security standard your app should meet, adding a non-root user might not suffice.