Tag Archives: Docker

Container Security: Just The Good Parts

Security is usually a matter of trade-offs. Questions like: “Is X Secure?”, don’t often have direct yes or no answers. A technology can mitigate certain classes of risk even as it exacerbates others.

Containers are just such a recent technology and their security impact is complex. Although some of the common risks of containers are beginning to be understood, many of their upsides are yet to be widely recognized. To emphasize the point, this post will highlight three of advantages of containers that sysadmins and DevOps can use to make installations more secure.

Example Application

To give this discussion focus, we will consider an example application: a simple imageboard application. This application allows users to create and respond in threads of anonymous image and text content. Original posters can control their posts via “tripcodes” (which are basically per-post passwords). The application consists of the following “stack”:

  • nginx to serve static content, reverse proxy the active content, act as a cache-layer, and handle SSL
  • node.js to do the heavy lifting
  • mariadb to enable persistence

The Base Case

The base-case for comparison is the complete stack being hosted on a single machine (or virtual machine). It is true that this is a simple case, but this is not a straw man. A large portion of the web is served from just such unified instances.

The Containerized Setup

The stack naturally splits into three containers:

  • container X, hosting nginx
  • container J, hosting node.js
  • container M, hosting mariadb

Additionally, three /var locations are created on the host: (1) one for static content (a blog, theming, etc.), (2) one for the actual images, and (3) one for database persistence. The node.js container will have a mount for the the image-store, the mariadb container will have a mount for the database, and the nginx container will have mounts for both the image-store and static content.

Advantage #1: Isolated Upgrades

Let’s look at an example patch Tuesday under both setups.

The Base Case

The sysadmin has prepared a second staging instance for testing the latest patches from her distribution. Among the updates is a critical one for SSL that prevents a key-leak from a specially crafted handshake. After applying all updates, she starts her automatic test suite. Everything goes well until the test for tripcodes. It turns out that the node.js code uses the SSL library to hash the tripcodes for storage and the fix either changed the signature or behavior of those methods. This puts the sysadmin in a tight spot. Does she try to disable tripcodes? Hold back the upgrade?

The Contained Case

Here the sysadmin has more work to do. Instead of updating and testing a single staging instance, she will update and test each individual container, promoting them to production on a container-by-container basis. The nginx and mariadb containers suceed and she replaces them in production. Her keys are safe. As with the base case, the tripcode tests don’t succeed. Unlike the base case, the sysadmin has the option of holding back just the node.js’s SSL library and the nature of the flaw being key-exposure at handshake means that this is not an emergency requiring her to rush developers for a fix.

The Advantage

Of course, isolated upgrades aren’t unique to containers. node.js provides them itself, in the form of npm. So—depending on code specifics—the base case sysadmin might have been able to hold back the SSL library used for tripcodes. However, containers grant all application frameworks isolated upgrades, regardless of whether they provide them themselves. Further, they easily provide them to bigger portions of the stack.

Containers also simplify isolated upgrades. Technologies like rubygems or python virtualenvs create reliance on yet another curated collection of dependencies. It’s easy for sysadmins to be in a position where they need three or more such curated collections to update before their application is safe from a given vulnerability. Container-driven isolated upgrades let sysadmins lean on single collections, such as Linux distributions. These are much more likely to have—for example—paid support or guaranteed SLA’s. They also unify the dependency management to the underlying distribution’s update mechanism.

Containers can also make existing isolation mechanisms easier to manage. While the above case might have been handled via node.js’s npm mechanism, containers would have allowed the developers to deal with that complexity, simply handing an updated container to the sysadmin.

Of course, isolated upgrades are not always an advantage. In large-use environments the resource savings from shared images/memory may make it worth the additional headaches to move all applications forward in lock-step.

Advantage #2: Containers Simplify Real Isolation

Containers do not contain.” However, what containers do well is group related processes and create natural (if undefended) trusts boundaries. This—it turns out—simplifies the task of providing real containment immensely. SELinux, cgroups, iptables, and kernel capabilities have a—mostly undeserved—reputation of being complicated. Complemented with containers, these technologies become much simpler to leverage.

The Base Case

A sysadmin trying to lock-down their installation in the traditional case faces a daunting task. First, they must identify what processes should be allowed to do what. Does node.js as used in this application use /tmp? What kernel capabilities does mariadb need? The need to answer these questions is one of the reasons technologies such as SELinux are considered complicated. They require a deep understanding of the behavior of not just application code, but the application runtime and the underlying OS itself. The tools available to trouble-shoot these issues are often limited (e.g. strace).

Even if the sysadmin is able to nail down exactly what processes in her stack need what capabilities (kernel or otherwise) the question of how to actually bind the application by those restrictions is still a complicated one. How will the processes be transitioned to the correct SELinux context? The correct cgroup?

The Contained Case

In contrast, a sysadmin trying to secure a container has four advantages:

  1. It is trivial (and usually automatic) to transition an entire container into a particular SELinux context and/or cgroup (Docker has –security-opt, OpenShift PID-based groups, etc.).
  2. Operating system behavior need not be locked down, only the container/host relationship.
  3. The container is—usually—placed on a virtual network and/or interface (often the container runtime environment even has supplemental lock-down capabilities).
  4. Containers naturally provide for experimentation. You can easily launch a container with a varying set of kernel capabilities.

Most frameworks for launching containers do so with sensible “base” SELinux types. For example, both Docker and systemd-nspawn (when using SELinux under RHEL or Fedora) launch all containers with variations of svirt types based on previous work with libvirt. Additionally, many container launchers also borrow libvirt’s philosophy of giving each launched container unique Multi-Category Security (MCS) labels that can optionally be set by the admin. Combined with read-only mounting and the fact that an admin only needs to worry about container/host interactions, this MCS functionality can go a long way towards restricting an applications behavior.

For this application, it is straight-forward to:

  • Label the static, image, and database stores with unique MCS labels (e.g. c1, c2, and c3).
  • Launch the nginx container with labels and binding options (i.e. :ro) appropriate for reading only the image and static stores (-v /path:/path:ro and –security-opt=label:level:s0:c1,c2 for Docker).
  • Launch the node.js container binding the image store read/write and with a label giving it only access to that store.
  • Launching the mariadb container with only the data persistence store mounted read/write and with a label giving it access only to that store.

Should you need to go beyond what MCS can offer, most container frameworks support launching containers with specific SELinux types. Even when working with derived or original SELinux types, containers make everything easier as you need only worry about the interactions between the container and host.

With containers, there are many tools for restricting intra-container communication. Alternatively, for all container frameworks that give each container a unique IP, iptables can also be applied directly. With iptables—for example—it is easy to restrict:

  • The nginx container from speaking anything but HTTP to the nginx container and HTTPS to the outside world.
  • Block the node.js container from doing anything but speaking HTTP to the nginx container and using the database port of the mariadb container.
  • Block mariadb from doing anything but receiving request from the node.js container on it’s database port.

For preventing DDOS or other resource-based attacks, we can use the container launchers built-in tools (e.g. Docker’s ulimit options) or cgroups directly. Either way it is easy to—for example—restrict the node.js and mariadb containers to some hard resource limit (40% of RAM, 20% of CPU and so on).

Finally, container frameworks combined with unit tests are a great way for finding a restricted set of kernel capabilities with which to run an application layer. Whether the framework encourages starting with a minimal set and building up (systemd-nspawn) or with a larger set and letting you selectively drop (Docker), it’s easy to keep launching containers until you find a restricted—but workable—collection.

The configuration to isolation ratio of the above work is extremely high compared to “manual” SELinux/cgroup/iptables isolation. There is also much less to “go wrong” as it is much easier to understand the container/host relationship and its needs than it is to understand the process/OS relationship. Among other upsides, the above configuration: prevents a compromised nginx from altering any data on the host (including the image-store and database), prevents a compromised mariadb from altering anything other than the database, and—depending on what exact kernel capabilities are absolutely required—may go a long way towards prevention of privilege escalation.

The Advantage

While containers do not allow for any forms of isolation not already possible, in practice they make configuring isolation much simpler. They limit isolation to container/host instead of process/OS. By binding containers to virtual networks or interfaces, they simplify firewall rules. Container implementations often provide sensible SELinux or other protection defaults that can be easily extended.

The trade-off is that containers expose an additional container/container attack-surface that is not as trivial to isolate.

Advantage #3: Containers Have More Limited and Explicit Dependencies

The Base Case

Containers are meant to eliminate “works for me” problems. A common cause of “works for me” problems in traditional installations is hidden dependencies. An example is a software component depending on a common command line utility without a developer knowing it. Besides creating instability over installation types, this is a security issue. A sysadmin cannot protect against a vulnerability in a component they do not know is being used.

The flip-side of unknown dependencies and of much greater concern is extraneous or cross-over components. Components needed by one portion of the stack can actually make other components not designed with them in mind extremely dangerous. Many privilege escalation flaws involve abusing access to suid programs that, while essential to some applications, are extraneous to others.

The Contained Case

Obviously, container isolation helps prevent component dependency cross-over but containers also help to minimize extraneous dependencies. Containers are not virtual machines. Containers do not not have to boot, they do not have to support interactive usage, they are usually single user, and can be simpler than a full operating system in any number of ways. Thus containers can eschew service launchers, shells, sensitive configuration files, and other cruft that serves (from an application perspective) to only serve as an attack surface.

Truly minimal custom containers will more or less look like just the top few layers of their RPM/Deb/pkg “pyramid” without any of the bottom layers. Even “general” purpose containers are undergoing a healthy “race to the bottom” to have as minimal a starting footprint as possible. The Docker version of RHEL 7, an operating system not exactly famous for minimalism, is itself less than 155 megs uncompressed.

The Advantage

Container isolation means that when a portion of your application stack has a dependency, that dependency’s attack surface is available only to that portion of your application. This is in stark contrast to traditional installations where attack surfaces are always additive. Exploitation almost always involves chaining multiple vulnerabilities, so this advantage may be one of containers’ most powerful.

A common security complaint regarding containers is that in many ways they are comparable to statically linked binaries. The flip side is that this puts pressure on developers and maintainers to minimize the size of these blobs, which minimizes their attack surface. Shellshock is a good example of the kind of vulnerability this mitigates. It is nearly impossible for a traditional container to not have a highly complex shell, but many containers ship without a shell of any kind.

Beyond containers themselves this pressure has resulted in the rise of the minimal host operating system (e.g. Atomic, CoreOS, RancherOS). This has brought a reduced attack surface (and in the case of Atomic a certain degree of immutability) to the host as well as the container.

Containers Is As Containers Do

Other security advantages of containers include working well in an immutable and/or stateless paradigms, good content auditability (especially compared to virtual machines), and—potentially—good verifiability. A single blog post can’t cover all of the upsides of containers, much less the upsides and downsides. Ultimately, a large part of understanding the security impact of containers is coming to terms with the fact that containers are neither degenerate virtual machines nor superior jails. They are a unique technology whose impact needs to be assessed on its own.

Before you initiate a “docker pull”

In addition to the general challenges that are inherent to isolating containers, Docker brings with it an entirely new attack surface in the form of its automated fetching and installation mechanism, “docker pull”. It may be counter-intuitive, but “docker pull” both fetches and unpacks a container image in one step. There is no verification step and, surprisingly, malformed packages can compromise a system even if the container itself is never run. Many of the CVE’s issues against Docker have been related to packaging that can lead to install-time compromise and/or issues with the Docker registry.

One, now resolved, way such malicious issues could compromise a system was by a simple path traversal during the unpack step. By simply using a tarball’s capacity to unpack to paths such as “../../../” malicious images were able to override any part of a host file system they desired.

Thus, one of the most important ways you can protect yourself when using Docker images is to make sure you only use content from a source you trust and to separate the download and unpack/install steps. The easiest way to do this is simply to not use “docker pull” command. Instead, download your Docker images over a secure channel from a trusted source and then use the “docker load” command. Most image providers also serve images directly over a secure, or at least verifiable, connection. For example, Red Hat provides a SSL-accessible “Container Images”.  Fedora also provides Docker images with each release as well.

While Fedora does not provide SSL with all mirrors, it does provide a signed checksum of the Docker image that can be used to verify it before you use “docker load”.

Since “docker pull” automatically unpacks images and this unpacking process itself is often compromised, it is possible that typos can lead to system compromises (e.g. a malicious “rel” image downloaded and unpacked when you intended “rhel”). This typo problem can also occur in Dockerfiles. One way to protect yourself is to prevent accidental access to index.docker.io at the firewall-level or by adding the following /etc/hosts entry:

127.0.0.1 index.docker.io

This will cause such mistakes to timeout instead of potentially downloading unwanted images. You can still use “docker pull” for private repositories by explicitly providing the registry:

docker pull registry.somewhere.com/image

And you can use a similar syntax in Dockerfiles:

from registry.somewhere.com/image

Providing a wider ecosystem of trusted images is exactly why Red Hat began its certification program for container applications. Docker is an amazing technology, but it is neither a security nor interoperability panacea. Images still need to come from sources that certify their security, level-of-support, and compatibility.

Container Security: Isolation Heaven or Dependency Hell

Docker is the public face of Linux containers and two of Linux’s unsung heroes: control groups (cgroups) and namespaces. Like virtualization, containers are appealing because they help solve two of the oldest problems to plague developers: “dependency hell” and “environmental hell.”

Closely related, dependency and environmental hell can best be thought of as the chief cause of “works for me” situations. Dependency hell simply describes the complexity inherent in modern application’s tangled graph of external libraries and programs they need to function. Environmental hell is the name for the operating system portion of that same problem (i.e. what wrinkles, in particular which bash implementation,on which that quick script you wrote unknowingly relies).

Namespaces provide the solution in much the same way as virtual memory simplified writing code on a multi-tenant machine: by providing the illusion that an application suite has the computer all to itself. In other words,”via isolation”. When a process or process group is isolated via these new namespace features, we say they are “contained.” In this way, virtualization and containers are conceptually related, but containers isolate in a completely different way and conflating the two is just the first of a series of misconceptions that must be cleared up in order to understand how to use containers as securely as possible. Virtualization involves fully isolating programs to the point that one can use Linux, for example, while another uses BSD. Containers are not so isolated. Here are a few of the ways that “containers do not contain:”

  1. Containers all share the same kernel. If a contained application is hijacked with a privilege escalation vulnerability, all running containers *and* the host are compromised. Similarly, it isn’t possible for two containers to use different versions of the same kernel module.
  2. Several resources are *not* namespaced. Examples include normal ulimit systems still being needed to control resources such as filehandlers. The kernel keyring is another example of a resource that is not namespaced. Many beginning users of containers find it counter-intuitive that socket handlers can be exhausted or that kerberos credentials are shared between containers when they believe they have exclusive system access. A badly behaving process in one container could use up all the filehandles on a system and starve the other containers. Diagnosing the shared resource usage is not feasible from within
  3. By default, containers inherit many system-level kernel capabilities. While Docker has many useful options for restricting kernel capabilities, you need a deeper understanding of an application’s needs to run it inside containers than you would if running it in a VM. The containers and the application within them will be dependent on the capabilities of the kernel on which they reside.
  4. Containers are not “write once, run anywhere”. Since they use the host kernel, applications must be compatible with said kernel. Just because many applications don’t depend on particular kernel features doesn’t mean that no applications do.

For these and other reasons, Docker images should be designed and used with consideration for the host system on which they are running. . By only consuming images from trusted sources, you reduce the risk of deploying containerized applications that exhaust system resources or otherwise create a denial of service attack on shared resources. Docker images should be considered as powerful as RPMs and should only be installed from sources you trust. You wouldn’t expect your system to remain secured if you were to randomly install untrusted RPMs nor should you if you “docker pull” random Docker images.

In the future we will discuss the topic of untrusted images.