Skip to main content

Container Platforms

Containers are lightweight, isolated execution environments that package an application together with its dependencies, libraries, and configuration into a single deployable unit. Unlike virtual machines, which virtualise hardware and run complete operating systems, containers virtualise the operating system kernel and share it among multiple isolated processes. This architectural difference produces containers that start in milliseconds rather than minutes, consume megabytes rather than gigabytes, and run identically whether deployed on a developer’s laptop, a test server, or production infrastructure spanning multiple continents.

Container
An isolated process running on a shared operating system kernel, with its own filesystem, network stack, and resource limits. The container includes the application code, runtime, libraries, and configuration required to execute.
Container image
An immutable, layered filesystem template from which containers are instantiated. Images are built from a Dockerfile or similar specification and stored in registries.
Container runtime
The software that executes containers, managing isolation, resource limits, and lifecycle. Examples include containerd, CRI-O, and Docker Engine.
Orchestration platform
Software that automates deployment, scaling, networking, and lifecycle management for containers across multiple hosts. Kubernetes is the dominant orchestration platform.
Pod
The smallest deployable unit in Kubernetes, consisting of one or more containers that share network namespace, storage volumes, and scheduling. Containers within a pod communicate via localhost.

For mission-driven organisations, containers solve persistent problems in software deployment: the application that works in development but fails in production, the dependency conflict between two systems sharing a server, the three-day provisioning process for new environments, and the inability to scale services during peak demand. Containers eliminate these problems through isolation and immutability. Each container runs in its own namespace with its own dependencies, immune to conflicts with other applications. Container images, once built, never change; the same image runs in every environment, guaranteeing consistency.

Container architecture

The container architecture stack consists of four layers: the host operating system kernel, the container runtime, the containers themselves, and the orchestration layer that manages containers across multiple hosts. Understanding how these layers interact explains both the efficiency gains containers provide and the operational requirements they impose.

+------------------------------------------------------------------+
| ORCHESTRATION LAYER |
| (Kubernetes, Docker Swarm, Nomad) |
| |
| +-------------------+ +-------------------+ +---------------+ |
| | Scheduling | | Service discovery | | Configuration | |
| | Scaling | | Load balancing | | Secrets | |
| | Self-healing | | DNS | | Storage | |
| +-------------------+ +-------------------+ +---------------+ |
+------------------------------------------------------------------+
|
+------------------------------------------------------------------+
| CONTAINER RUNTIME |
| (containerd, CRI-O, Docker Engine) |
| |
| +-------------------+ +-------------------+ +---------------+ |
| | Image management | | Container | | Resource | |
| | Layer caching | | lifecycle | | isolation | |
| | Registry pull | | Start/stop/kill | | cgroups/ns | |
| +-------------------+ +-------------------+ +---------------+ |
+------------------------------------------------------------------+
|
+------------------------------------------------------------------+
| HOST OPERATING SYSTEM |
| (Linux kernel with namespaces and cgroups) |
| |
| +-------------------+ +-------------------+ +---------------+ |
| | Namespaces | | Control groups | | Filesystem | |
| | PID, network, | | CPU, memory, | | OverlayFS | |
| | mount, user | | I/O limits | | Union mounts | |
| +-------------------+ +-------------------+ +---------------+ |
+------------------------------------------------------------------+
|
+------------------------------------------------------------------+
| HARDWARE |
| (Physical server or virtual machine) |
+------------------------------------------------------------------+

Figure 1: Container architecture stack showing orchestration, runtime, OS, and hardware layers

The host kernel provides two mechanisms that enable container isolation. Namespaces partition kernel resources so each container perceives its own isolated instance of the resource. The PID namespace gives each container its own process ID space, so process 1 inside the container (the init process) is actually process 47293 on the host. The network namespace gives each container its own network interfaces, routing tables, and port space. The mount namespace gives each container its own filesystem view. The user namespace maps container user IDs to different host user IDs, so root inside the container runs as an unprivileged user on the host.

Control groups (cgroups) limit and account for resource usage. A container configured with a 512 MB memory limit cannot allocate beyond that boundary; the kernel terminates processes that exceed the limit. CPU limits work through scheduling weights and quotas; a container with 0.5 CPU can use at most half of one CPU core’s time. I/O bandwidth limits prevent containers from monopolising disk throughput. These limits are enforced by the kernel, not by the container or orchestrator, making them difficult to circumvent.

The layered filesystem enables efficient image storage and fast container startup. A container image consists of multiple read-only layers stacked using a union filesystem. The base layer contains the operating system files. Subsequent layers add application runtimes, libraries, and application code. When a container runs, a thin writable layer is added on top. Changes made inside the running container are written to this layer without modifying the underlying image. Multiple containers from the same image share the read-only layers, storing only their unique changes. A 500 MB base image running 10 containers consumes approximately 500 MB for shared layers plus the minimal writable layers, not 5 GB.

Container versus virtual machine architecture

Containers and virtual machines both provide isolation but through fundamentally different mechanisms, with different performance characteristics, security boundaries, and operational requirements.

+--------------------------------+ +--------------------------------+
| VIRTUAL MACHINES | | CONTAINERS |
+--------------------------------+ +--------------------------------+
| | | |
| +----------------------------+ | | +--------+ +--------+ +------+ |
| | Application A | | | | App A | | App B | | App C| |
| +----------------------------+ | | +--------+ +--------+ +------+ |
| | Libraries/Bins | | | | Libs | | Libs | | Libs | |
| +----------------------------+ | | +--------+ +--------+ +------+ |
| | Guest OS (4GB+) | | | | |
| +----------------------------+ | | +----------------------------+ |
| | Hypervisor | | | | Container Runtime | |
| +----------------------------+ | | +----------------------------+ |
| | | |
| +----------------------------+ | | +----------------------------+ |
| | Application B | | | | Host Operating System | |
| +----------------------------+ | | | (single kernel) | |
| | Libraries/Bins | | | +----------------------------+ |
| +----------------------------+ | | |
| | Guest OS (4GB+) | | | |
| +----------------------------+ | | |
| | Hypervisor | | | |
| +----------------------------+ | | |
| | | |
| +----------------------------+ | | |
| | Host Operating System | | | |
| +----------------------------+ | | |
+--------------------------------+ +--------------------------------+
| Hardware | | Hardware |
+--------------------------------+ +--------------------------------+
Resources: Each VM needs Resources: Containers share
full OS (4-8GB RAM minimum) kernel (50-500MB per container)
Startup: 30-90 seconds Startup: 100-500 milliseconds
Isolation: Hardware-level Isolation: Kernel-level
(strongest) (weaker but sufficient)
Density: 10-20 VMs per host Density: 100-1000 containers

Figure 2: Virtual machine versus container architecture comparison

Virtual machines virtualise hardware through a hypervisor. Each VM runs a complete operating system with its own kernel, consuming 4-8 GB of RAM before the application loads. VMs boot in 30-90 seconds as the guest kernel initialises. The isolation boundary is the hypervisor; a vulnerability in one VM’s kernel does not affect other VMs because they run separate kernels.

Containers virtualise the operating system through the shared kernel. Each container runs as a process on the host kernel, consuming only the memory required by its application. Containers start in 100-500 milliseconds because no kernel boots; the container process simply begins executing. The isolation boundary is the kernel’s namespace and cgroup implementation; all containers share the same kernel, so a kernel vulnerability potentially affects all containers on the host.

The security trade-off guides deployment decisions. Containers provide sufficient isolation for applications under common ownership running on dedicated container hosts. Untrusted code from external sources, or applications with strict compliance requirements for isolation, belong in virtual machines. Most organisations run containers inside VMs: the VM provides the hard security boundary, while containers within the VM provide the deployment density and operational benefits.

Resource efficiency shapes infrastructure costs. A server with 64 GB RAM running VMs with 4 GB minimum allocation hosts at most 16 VMs before RAM exhaustion. The same server running containers at 500 MB average allocation hosts over 100 containers. This 6-10x density improvement reduces hardware requirements, cloud compute costs, and operational complexity proportionally.

Container runtimes

The container runtime executes and manages containers, handling image retrieval, filesystem setup, namespace and cgroup configuration, and container lifecycle operations. Three runtimes dominate production deployments, each with different characteristics and appropriate use cases.

containerd is the industry-standard container runtime, extracted from Docker and donated to the Cloud Native Computing Foundation. It provides a minimal, stable runtime focused on executing containers rather than building images or managing development workflows. containerd implements the Container Runtime Interface (CRI), enabling direct integration with Kubernetes without additional translation layers. Most managed Kubernetes services and production Kubernetes deployments use containerd as their runtime.

containerd operates through a simple model: it pulls images from registries, extracts them to local storage, creates containers from images, and manages container lifecycle (start, stop, pause, kill). It delegates low-level execution to runc, the reference implementation of the Open Container Initiative (OCI) runtime specification. This separation means containerd handles the complex state management while runc performs the actual kernel interactions for namespace and cgroup setup.

CRI-O is a lightweight runtime built specifically for Kubernetes, implementing the CRI without additional features. It pulls images, runs containers, and exposes them to Kubernetes through the CRI socket. CRI-O supports the same OCI images and delegates to runc identically to containerd. Red Hat OpenShift uses CRI-O as its default runtime.

The choice between containerd and CRI-O for Kubernetes deployments is largely organisational. containerd has broader adoption and more extensive tooling. CRI-O has a smaller codebase and attack surface. Both provide equivalent functionality for running containers in orchestrated environments.

Docker Engine includes a container runtime alongside image building, local container management, and developer tools. Docker Engine wraps containerd internally, adding the Docker CLI, Docker Compose for multi-container applications, and the Docker API for programmatic control. For development environments and single-host deployments without orchestration, Docker Engine provides the most complete experience.

Docker Engine carries overhead unnecessary in orchestrated production environments. Kubernetes manages container lifecycle, networking, and storage through its own abstractions; Docker’s equivalents are redundant. Managed Kubernetes services have migrated from Docker Engine to containerd, eliminating the additional layer and reducing the attack surface.

For organisations beginning with containers, Docker Engine provides the appropriate starting point. Developers use Docker to build images and run containers locally. When applications move to orchestrated production environments, the same images run on containerd or CRI-O without modification; the OCI specification guarantees compatibility. The runtime choice affects operations, not development.

Orchestration platforms

Container orchestration automates the deployment, scaling, networking, and lifecycle management of containers across clusters of hosts. Without orchestration, operating containers at scale requires manual placement decisions, custom scripts for failover, ad-hoc networking configuration, and constant monitoring for failed containers. Orchestration platforms encode operational knowledge into declarative configurations: specify the desired state, and the platform continuously reconciles actual state toward that target.

Kubernetes dominates container orchestration with over 90% market share among orchestrated deployments. Developed by Google based on their internal Borg system and donated to the Cloud Native Computing Foundation, Kubernetes provides a comprehensive platform for running containerised workloads at any scale. The Kubernetes API has become the de facto standard; tools, integrations, and operational practices assume Kubernetes semantics.

+------------------------------------------------------------------+
| KUBERNETES CLUSTER |
+------------------------------------------------------------------+
| |
| +------------------------------------------------------------+ |
| | CONTROL PLANE | |
| | | |
| | +-------------+ +-------------+ +--------------------+ | |
| | | API Server | | Controller | | Scheduler | | |
| | | | | Manager | | | | |
| | | - REST API | | - Node | | - Pod placement | | |
| | | - AuthN/Z | | - Replica | | - Resource fit | | |
| | | - Admission | | - Endpoint | | - Affinity rules | | |
| | +------+------+ +------+------+ +---------+----------+ | |
| | | | | | |
| | +----------------+-------------------+ | |
| | | | |
| | +-------v-------+ | |
| | | etcd | | |
| | | (state store) | | |
| | +---------------+ | |
| +------------------------------------------------------------+ |
| | |
| +--------------------+--------------------+ |
| | | | |
| +-----v------+ +------v-----+ +-------v----+ |
| | WORKER | | WORKER | | WORKER | |
| | NODE 1 | | NODE 2 | | NODE 3 | |
| | | | | | | |
| | +--------+ | | +--------+ | | +--------+ | |
| | | kubelet| | | | kubelet| | | | kubelet| | |
| | +--------+ | | +--------+ | | +--------+ | |
| | | | | | | |
| | +--------+ | | +--------+ | | +--------+ | |
| | |kube- | | | |kube- | | | |kube- | | |
| | |proxy | | | |proxy | | | |proxy | | |
| | +--------+ | | +--------+ | | +--------+ | |
| | | | | | | |
| | +--------+ | | +--------+ | | +--------+ | |
| | |container | | |container | | |container | |
| | |runtime | | | |runtime | | | |runtime | | |
| | +--------+ | | +--------+ | | +--------+ | |
| | | | | | | |
| | [Pod][Pod] | | [Pod][Pod] | | [Pod][Pod] | |
| +------------+ +------------+ +------------+ |
+------------------------------------------------------------------+

Figure 3: Kubernetes architecture showing control plane components and worker nodes

The control plane maintains cluster state and makes scheduling decisions. The API server exposes the Kubernetes API, handling authentication, authorisation, and admission control for all requests. The controller manager runs controllers that watch for state changes and reconcile toward desired state; the replication controller ensures the specified number of pod replicas run, the node controller monitors node health, the endpoint controller populates service endpoints. The scheduler assigns pods to nodes based on resource requirements, affinity rules, and constraints. etcd stores all cluster state as key-value data with strong consistency guarantees.

Worker nodes run containerised workloads. The kubelet agent on each node receives pod specifications from the API server, instructs the container runtime to start containers, and reports status back. kube-proxy maintains network rules enabling service abstraction; traffic to a service virtual IP routes to healthy pod endpoints. The container runtime (containerd or CRI-O) executes containers per kubelet instructions.

Kubernetes introduces abstractions above raw containers that simplify operations:

A Deployment declares the desired state for a set of identical pods: the container image, replica count, resource limits, and update strategy. When you modify the deployment (changing the image version, for example), Kubernetes performs a rolling update, creating new pods before terminating old ones, ensuring zero downtime.

A Service provides stable network identity for a set of pods. Pods receive dynamic IP addresses and may be rescheduled to different nodes; services provide a consistent DNS name and virtual IP that routes to healthy pods matching a label selector. Internal services enable pod-to-pod communication; LoadBalancer services integrate with cloud provider load balancers for external traffic.

A ConfigMap externalises configuration from container images. Applications read database hostnames, feature flags, and environment-specific settings from ConfigMaps rather than baking them into images. Changing configuration creates new ConfigMaps; pods referencing them can reload or restart to pick up changes.

A Secret stores sensitive data (passwords, API keys, certificates) with access controls stricter than ConfigMaps. Secrets are base64-encoded by default and can be encrypted at rest in etcd. Applications mount secrets as files or environment variables.

This abstraction layer enables declarative operations. Rather than scripting imperative commands (“start container X on node Y, configure network Z”), operators declare desired state (“run 3 replicas of image X with 512MB memory each”) and Kubernetes continuously ensures reality matches intent.

Lightweight orchestration alternatives

Kubernetes solves problems that emerge at scale: automating operations across hundreds of nodes, managing thousands of pods, handling complex networking and storage requirements, integrating with cloud provider infrastructure. Organisations running 5-20 containers on 1-3 servers face different constraints; for them, Kubernetes introduces complexity disproportionate to the problem.

Docker Compose defines multi-container applications in a single YAML file, specifying services, networks, and volumes. Running docker-compose up starts all services with correct networking and dependencies. Compose suits development environments, single-host deployments, and small production workloads where container counts remain manageable and high availability is achieved through other means (database replication, upstream load balancers).

Docker Swarm extends Compose concepts across multiple hosts. Swarm mode, built into Docker Engine, creates a cluster from multiple Docker hosts and deploys services across them. Swarm provides service discovery, load balancing, rolling updates, and secrets management with minimal configuration. A three-node Swarm cluster deploys in under an hour; equivalent Kubernetes requires days.

Nomad from HashiCorp provides workload orchestration for containers, VMs, and standalone executables. Nomad’s single binary architecture simplifies deployment; a production cluster requires three servers and agents on worker nodes. Nomad lacks Kubernetes’ extensive ecosystem but compensates with operational simplicity and the ability to orchestrate non-container workloads alongside containers.

+------------------------------------------------------------------+
| ORCHESTRATION PLATFORM DECISION TREE |
+------------------------------------------------------------------+
|
+----------v----------+
| Container count |
| > 50? |
+----------+----------+
|
+---------------+---------------+
| |
| Yes | No
v v
+----------+----------+ +----------+----------+
| Multi-node | | High availability |
| requirement? | | needed? |
+----------+----------+ +----------+----------+
| |
+---------+---------+ +---------+---------+
| | | |
| Yes | No | Yes | No
v v v v
+---+-------+ +-------+---+ +---+-------+ +-------+---+
| Kubernetes| | Kubernetes| | Docker | | Docker |
| (managed | | (self- | | Swarm | | Compose |
| or self) | | hosted) | | | | |
+-----------+ +-----------+ +-----------+ +-----------+

Figure 4: Orchestration platform selection based on scale and requirements

The decision between orchestration platforms depends on operational capacity, not feature comparisons. An organisation with one IT person supporting 200 staff cannot maintain a Kubernetes cluster alongside other responsibilities. Docker Compose on a single server with good backup procedures serves better than a Kubernetes cluster that receives insufficient attention. As container counts grow and availability requirements increase, the operational investment in Kubernetes becomes justified.

Container networking

Containers require network connectivity to receive requests, communicate with other containers, and access external services. Container networking provides each container with an IP address and enables communication patterns from simple single-host deployments to complex multi-host clusters with service discovery and load balancing.

The bridge network is the default for single-host Docker deployments. Docker creates a virtual bridge (docker0) on the host and attaches container virtual interfaces to it. Containers on the bridge communicate with each other using container IP addresses. The host provides NAT for outbound connectivity. Published ports map host ports to container ports for inbound traffic.

On a host running three containers on the default bridge network:

Container A: 172.17.0.2
Container B: 172.17.0.3
Container C: 172.17.0.4
Host external interface: 192.168.1.100
Host bridge (docker0): 172.17.0.1
A can reach B at 172.17.0.2:8080
External clients reach A via 192.168.1.100:80 -> 172.17.0.2:8080

Bridge networking suffices for development and simple deployments but lacks service discovery (containers must know each other’s IP addresses) and does not scale beyond single hosts.

Overlay networks span multiple hosts, enabling containers on different servers to communicate using a single logical network. The overlay encapsulates container traffic within UDP packets that traverse the underlying host network, then decapsulates at the destination host for delivery to the target container. Docker Swarm and Kubernetes both use overlay networking for multi-host communication.

Kubernetes implements a flat network model where every pod receives a routable IP address within the cluster network, and every pod can communicate with every other pod without NAT. This model simplifies application development; services specify target ports without considering port mapping or NAT traversal. The actual implementation uses a Container Network Interface (CNI) plugin.

CNI plugins implement the flat network model through various mechanisms:

Calico uses BGP routing to advertise pod IP addresses across nodes. Each node runs a Calico agent that programs routes for pods on that node. Traffic between pods on different nodes traverses the underlying network using standard IP routing. Calico supports network policies for pod-to-pod traffic control and scales to thousands of nodes.

Flannel creates an overlay network using VXLAN encapsulation. It is simpler to deploy than Calico but provides fewer features; network policies require an additional component. Flannel suits smaller clusters where operational simplicity outweighs advanced networking requirements.

Cilium uses eBPF (extended Berkeley Packet Filter) for high-performance networking and security. eBPF programs run in the kernel, processing packets without userspace transitions. Cilium provides network policies, load balancing, and observability with lower overhead than iptables-based solutions. It requires Linux kernel 4.9.17 or later.

+------------------------------------------------------------------+
| KUBERNETES NETWORKING |
+------------------------------------------------------------------+
| |
| +--------------------+ +--------------------+ |
| | NODE 1 | | NODE 2 | |
| | | | | |
| | +------+ +------+ | | +------+ +------+ | |
| | |Pod A | |Pod B | | | |Pod C | |Pod D | | |
| | |10.0. | |10.0. | | | |10.0. | |10.0. | | |
| | |1.2 | |1.3 | | | |2.2 | |2.3 | | |
| | +--+---+ +--+---+ | | +--+---+ +--+---+ | |
| | | | | | | | | |
| | +--v--------v--+ | | +--v--------v--+ | |
| | | CNI Plugin | | | | CNI Plugin | | |
| | | (pod network)| | | | (pod network)| | |
| | +------+-------+ | | +------+-------+ | |
| | | | | | | |
| +---------+----------+ +--------+-----------+ |
| | | |
| +----------------+-----------------+ |
| | |
| +---------v---------+ |
| | Underlay Network | |
| | (host networking) | |
| +-------------------+ |
| |
| Service "web" -> 10.96.0.10:80 |
| Routes to: Pod A, Pod C (selected by labels) |
| kube-proxy programs iptables/IPVS rules |
+------------------------------------------------------------------+

Figure 5: Kubernetes networking with CNI plugin and service abstraction

Services provide stable endpoints for sets of pods. When a client connects to a service IP address, kube-proxy (using iptables or IPVS rules) load balances the connection to a healthy pod endpoint. Services abstract pod IP address changes due to restarts, reschedules, or scaling; clients connect to the stable service IP, and kube-proxy handles routing to current pod instances.

Persistent storage

Containers are ephemeral by design; when a container terminates, its writable filesystem layer is destroyed. Applications requiring persistent data (databases, file uploads, configuration generated at runtime) need storage that survives container restarts and rescheduling.

Volumes in Docker and Persistent Volumes in Kubernetes decouple storage lifecycle from container lifecycle. A volume mounts a filesystem path into the container; data written to that path persists on storage outside the container’s filesystem layer.

Docker volumes store data on the host filesystem at /var/lib/docker/volumes/. Bind mounts map arbitrary host paths into containers. Both approaches tie storage to a specific host; containers rescheduled to other hosts lose access.

Kubernetes abstracts storage through Persistent Volume Claims (PVCs) and Persistent Volumes (PVs). A PVC specifies storage requirements: size, access mode (read-write single node, read-write many nodes, read-only many nodes), and storage class. Kubernetes binds the PVC to an available PV satisfying the requirements. The pod specification references the PVC, and Kubernetes mounts the storage when scheduling the pod.

Storage classes define storage provisioning. A storage class for cloud provider block storage creates disk volumes on demand when PVCs request them. A storage class for network-attached storage (NFS, Ceph, GlusterFS) provisions shares from the storage system. Storage classes enable dynamic provisioning: applications request storage through PVCs, and Kubernetes creates the underlying storage automatically.

For a PostgreSQL database requiring 100 GB of persistent storage:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard-ssd
resources:
requests:
storage: 100Gi
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: standard-ssd
resources:
requests:
storage: 100Gi

StatefulSets provide stable network identities and persistent storage for stateful applications. Each replica receives a predictable hostname (postgres-0, postgres-1) and its own PVC. If a pod is rescheduled, Kubernetes mounts the same PVC on the new node, preserving data. StatefulSets are appropriate for databases, message queues, and other applications requiring identity and state.

Stateless applications use Deployments, which treat replicas as interchangeable. Rescheduled pods receive new identities and fresh storage. Most web applications, API servers, and worker processes are stateless; they read state from databases or external services rather than local storage.

Container security

Containers share the host kernel, making security configuration critical. A container that escapes its isolation boundary accesses the entire host, potentially including secrets, network access, and other containers. Defence in depth applies: assume any single control might fail and layer protections accordingly.

Image security begins at build time. Base images should derive from minimal, trusted sources; Alpine Linux images at 5 MB contain fewer vulnerable packages than full distributions at 200+ MB. Dependencies should pin specific versions, not “latest” tags that introduce unpredictable changes. Build processes should scan images for known vulnerabilities using tools like Trivy, Grype, or Clair.

A vulnerability scan of a container image reveals:

postgres:15 (debian 12.4)
============================================
Total: 142 (UNKNOWN: 0, LOW: 89, MEDIUM: 42, HIGH: 9, CRITICAL: 2)
CRITICAL: CVE-2024-0567 libgnutls30 3.7.9-2 Fixed in 3.7.9-2+deb12u1
CRITICAL: CVE-2023-4911 libc6 2.36-9 Fixed in 2.36-9+deb12u2
HIGH: CVE-2023-52425 libexpat1 2.5.0-1 No fix available
...

Critical and high vulnerabilities in base images require remediation before production deployment. Rebuild images with patched base layers or switch to alternative images with current patches. Accept the risk of unfixed vulnerabilities only with documented justification and compensating controls.

Image signing verifies image provenance. Sigstore Cosign signs images with cryptographic keys; Kubernetes admission controllers reject unsigned or incorrectly signed images. Image signing prevents deployment of images from untrusted sources or images modified after signing.

Runtime security constrains container capabilities. By default, containers run as root inside their namespace, with capabilities (a subset of root privileges) enabling specific operations. Production containers should:

Run as non-root users. The Dockerfile USER 1000 directive or Kubernetes runAsNonRoot: true security context prevents processes from running as UID 0. Non-root processes cannot write to most system locations even if they escape the container.

Drop unnecessary capabilities. The CAP_NET_ADMIN capability enables network configuration; most applications do not need it. The CAP_SYS_ADMIN capability enables many privileged operations and should never be granted to untrusted containers. Kubernetes security contexts specify dropped capabilities explicitly.

Use read-only root filesystems. Containers that write only to mounted volumes can run with readOnlyRootFilesystem: true, preventing attackers from modifying binaries or dropping files into the container filesystem.

Secrets management prevents sensitive data exposure. Secrets should never appear in container images, environment variables visible in process listings, or version control. Kubernetes Secrets mounted as files provide basic secrets storage. External secrets operators integrate with HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault for secrets with access controls, rotation, and audit logging.

Network policies restrict pod-to-pod communication. By default, Kubernetes allows all pods to communicate with all other pods. Network policies specify allowed ingress and egress, implementing micro-segmentation within the cluster. A network policy allowing only the web tier to reach the API tier, and only the API tier to reach the database, contains breaches; compromised web containers cannot directly access the database.

Registry and image management

Container registries store and distribute container images. A registry is to container images what a package repository is to software packages: the central location where images are pushed after build and pulled before deployment.

Public registries host open-source and community images. Docker Hub is the default registry for Docker; images without explicit registry prefixes pull from Docker Hub. GitHub Container Registry (ghcr.io) hosts images alongside source repositories. Quay.io, operated by Red Hat, provides another public option with security scanning.

Private registries host organisation-specific images with access controls. Major cloud providers offer managed registries: Amazon Elastic Container Registry (ECR), Azure Container Registry (ACR), Google Artifact Registry. Self-hosted options include Harbor (CNCF project with scanning, signing, and replication), GitLab Container Registry (integrated with GitLab CI/CD), and the basic Docker Registry for minimal deployments.

Image tagging conventions affect deployment reliability. The latest tag is mutable; successive pushes overwrite it with new images. Production deployments should reference immutable tags: semantic versions (v1.2.3), git commit SHAs (abc123def), or timestamps (20240115-143022). Immutable tags ensure that the image deployed today deploys identically tomorrow.

Image lifecycle management removes obsolete images to conserve storage. Registries accumulate images rapidly; without pruning, storage costs grow indefinitely. Retention policies remove untagged images, images older than specified retention periods, and excess tags beyond retention counts. Harbor and cloud registry services provide policy-based cleanup; simpler registries require external tooling.

+-------------------------------------------------------------------+
| IMAGE LIFECYCLE FLOW |
+-------------------------------------------------------------------+
| |
| +----------+ +----------+ +----------+ +----------+ |
| | Source | | Build | | Registry| | Runtime | |
| | Code | | System | | | | | |
| +----+-----+ +----+-----+ +----+-----+ +----+-----+ |
| | | | | |
| | 1. Commit | | | |
| +-------------->| | | |
| | | | |
| | 2. Build image | | |
| | (Dockerfile)| | |
| | | | |
| | 3. Scan for | | |
| | vulnerabilities | |
| | | | |
| | 4. Sign image | | |
| | | | |
| | 5. Push | | |
| +--------------->| | |
| | | |
| | 6. Store with | |
| | immutable tag |
| | | |
| | 7. Pull | |
| |<---------------+ |
| | | |
| | | 8. Run |
| | | |
+-------------------------------------------------------------------+

Figure 6: Container image lifecycle from source commit through registry to runtime deployment

CI/CD integration

Containers integrate naturally with continuous integration and deployment pipelines. The container image is the deployment artefact; building, testing, and deploying images forms the pipeline.

A representative pipeline for a containerised application:

  1. Source stage: Developer commits code to version control. The commit triggers the pipeline.

  2. Build stage: The pipeline builds the container image using the Dockerfile. The build runs in a clean environment, ensuring reproducibility. The image tag incorporates the git commit SHA or build number for traceability.

  3. Test stage: The pipeline runs tests against the built image. Unit tests execute inside the container. Integration tests spin up the container alongside dependencies (databases, message queues) in an ephemeral environment.

  4. Scan stage: Security scanning identifies vulnerabilities in the image. The pipeline fails if critical vulnerabilities are detected, preventing vulnerable images from reaching production.

  5. Push stage: The pipeline pushes the image to the registry, signing it to attest provenance. The image is now available for deployment.

  6. Deploy stage: The pipeline updates the deployment configuration with the new image tag and applies it to the target environment. For Kubernetes, this updates the Deployment manifest and applies it via kubectl or GitOps tooling.

GitOps extends this model by storing deployment configuration in version control and using automated tooling to synchronise cluster state with the repository. ArgoCD and Flux watch git repositories for configuration changes; when a deployment manifest changes (new image tag, modified replicas, updated environment variables), the GitOps tool applies the change to the cluster. This approach provides audit trails, rollback capability, and consistent deployment across environments.

For organisations without dedicated DevOps capacity, managed CI/CD services reduce operational burden. GitHub Actions, GitLab CI/CD, and Azure DevOps provide pipeline execution without self-hosted infrastructure. These services integrate directly with container registries and Kubernetes clusters, enabling automated deployment without maintaining build servers.

Implementation considerations

For organisations with limited IT capacity

Container adoption requires investment in skills and infrastructure. Organisations with minimal IT capacity should begin with Docker Compose on single servers, deploying 3-5 containers for low-risk applications. This approach provides container benefits (isolation, reproducibility, portability) without orchestration complexity.

A reasonable starting point deploys a single application (the organisation’s website, an internal tool, a data pipeline) as containers. The team learns image building, container networking, volume management, and operational practices on a contained problem. Success builds confidence and skills for broader adoption.

Skip Kubernetes until container counts exceed 20-30, availability requirements demand automated failover, or operational capacity grows to support cluster maintenance. Docker Swarm provides a gentler path to multi-host orchestration, requiring days rather than weeks to deploy and operate.

Managed Kubernetes services (EKS, AKS, GKE, DigitalOcean Kubernetes, Civo) eliminate control plane operations but still require worker node management, networking configuration, and storage integration. The Kubernetes learning curve applies regardless of managed status.

For organisations with established IT functions

Organisations with dedicated infrastructure teams can adopt Kubernetes with appropriate investment. Expect 3-6 months from decision to production-ready cluster, accounting for training, infrastructure provisioning, security integration, and operational documentation.

Platform engineering teams supporting development teams should provide standardised templates, CI/CD pipelines, and self-service capabilities. Developers specify their application requirements; the platform handles container building, scanning, deployment, and operations. This separation of concerns enables development teams to benefit from containers without becoming Kubernetes experts.

Multi-cluster strategies emerge as Kubernetes adoption matures. Separate clusters for development, staging, and production provide isolation and reduce blast radius. Clusters per region enable geographic distribution. Federation tooling (Rancher, OpenShift, Cluster API) provides consistent management across clusters.

Technology options

ComponentOpen sourceCommercial / Managed
Runtimecontainerd, CRI-O, Docker EngineIncluded in managed services
Single-host orchestrationDocker ComposeDocker Desktop (commercial licence for larger organisations)
Multi-host orchestrationKubernetes, Docker Swarm, NomadEKS, AKS, GKE, OpenShift, Rancher
RegistryHarbor, Docker RegistryECR, ACR, Artifact Registry, Docker Hub
Security scanningTrivy, Grype, ClairSnyk Container, Prisma Cloud
GitOpsArgoCD, FluxCodefresh, Harness

Open source options provide full functionality without licensing costs. Commercial and managed services reduce operational burden at the cost of per-cluster, per-node, or consumption-based pricing. Managed Kubernetes services cost $70-150 per month for the control plane plus worker node compute; self-hosted Kubernetes requires equivalent compute plus operational time valued at 10-20 hours monthly for a small cluster.

See also