7  Containerisation With Docker

7.1 Introduction

Up until now, we’ve been using Nix as a powerful tool for creating reproducible development environments directly on our machines. Nix gives us fine-grained control over every package and dependency in our project, ensuring bit-for-bit reproducibility. However, when it comes to distributing a data product, another technology, Docker, is incredibly popular.

A common misconception is that Nix and Docker are alternatives, that you must choose one or the other. This is wrong. They are conceptually different tools that solve different problems. Nix is a package manager and build system: it answers “what software do I need and how do I build it reproducibly?” Docker is a containerisation platform: it answers “how do I package and run an application in isolation?” Understanding this distinction is key to using them together effectively.

While Nix manages dependencies for an application that runs on a host operating system, Docker takes a different approach: it packages an application along with a lightweight operating system and all its dependencies into a single, portable unit called a container. This container can then run on any machine that has Docker installed, regardless of its underlying OS. Being familiar with both tools makes you a more versatile data scientist: you can use Nix for precise, reproducible development environments and Docker for distributing your work to colleagues, servers, or CI pipelines that may not have Nix installed.

7.1.1 Spatial vs Temporal Reproducibility

To understand when and why to use Docker, it helps to distinguish between two types of reproducibility:

  • Spatial reproducibility: The ability to execute an analysis identically across different machines right now. Docker excels here: a container runs the same way on your laptop, a colleague’s workstation, and a cloud server.

  • Temporal reproducibility: The ability to execute an analysis identically over time. Docker is weaker here. The imperative commands in a Dockerfile (like apt-get update) are non-deterministic. Rebuilding the same file at different times can yield different images.

This distinction is crucial. Docker’s true reproducibility promise is that a specific, pre-built image will always launch an identical container. It does not promise that building the same Dockerfile twice will yield an identical image.

As empirical research has shown, achieving deterministic builds requires systems designed for this purpose, like Nix’s functional model, where identical inputs always produce identical outputs. Studies rebuilding historical Nix packages found bit-for-bit reproducibility rates exceeding 90%.

7.1.2 The Best of Both Worlds

This is why we combine Nix and Docker rather than choosing one or the other:

  • Nix provides strong temporal reproducibility: the guarantee that an environment built today will be identical when rebuilt years later.
  • Docker provides strong spatial reproducibility and universal distribution: the guarantee that a container runs the same everywhere Docker runs.

The pattern we’ll use is simple: use Nix inside a Docker container. Start with a minimal base image that has Nix installed. Then, use Nix to declaratively build the precise environment within the image. Docker becomes a portable runtime for this Nix-managed environment, excellent for deployment to systems where Nix isn’t installed.

7.1.3 Interactive Development vs Distribution

This approach also clarifies when to use each tool:

  • Use Nix directly for interactive development. It integrates seamlessly with your IDE and filesystem. No volume mounts, no port forwarding, no graphical application headaches.

  • Use Docker for distributing finished data products. Package your {rixpress} pipeline into an image that anyone can run with a single command.

If you’ve never heard of Docker before, this chapter will provide the basic knowledge required to get started. Let’s start by watching this very short video that introduces the core concepts.

The Rocker Project provides a large collection of ready-to-use Docker images for R users.

7.2 Docker Essentials

7.2.1 Installing Docker

The first step is to install Docker. You’ll find the instructions for Ubuntu here, for Windows here (read the system requirements section as well!) and for macOS here (make sure to choose the right version for the architecture of your Mac, if you have an M1 Mac use Mac with Apple silicon).

After installation, it might be a good idea to restart your computer, if the installation wizard does not invite you to do so. To check whether Docker was installed successfully, run the following command in a terminal:

docker run --rm hello-world

This should print the following message:

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

If you see this message, congratulations, you are ready to run Docker. If you see an error message about permissions, this means that something went wrong. If you’re running Linux, make sure that your user is in the Docker group by running:

groups $USER

You should see your username and a list of groups that your user belongs to. If a group called docker is not listed, then you should add yourself to the group by following these steps.

7.2.2 The Rocker Project and Image Registries

When running a command like:

docker run --rm hello-world

what happens is that an image, in this case hello-world, gets pulled from a so-called registry. A registry is a storage and distribution system for Docker images. Think of it as a GitHub for Docker images, where you can push and pull images, much like you would with code repositories. The default public registry that Docker uses is called Docker Hub, but companies can also host their own private registries to store proprietary images.

Many open source projects build and distribute Docker images through Docker Hub, for example the Rocker Project.

The Rocker Project is instrumental for R users that want to use Docker. The project provides a large list of images that are ready to run with a single command. As an illustration, open a terminal and paste the following line:

docker run --rm -e PASSWORD=yourpassword -p 8787:8787 rocker/rstudio

Once this stops running, go to http://localhost:8787/ and enter rstudio as the username and yourpassword as the password. You should login to a RStudio instance: this is the web interface of RStudio that allows you to work with R from a server. In this case, the server is the Docker container running the image. Yes, you’ve just pulled a Docker image containing Ubuntu with a fully working installation of RStudio web!

Let’s open a new script and run the following lines:

data(mtcars)
summary(mtcars)

You can now stop the container (by pressing CTRL-C in the terminal). Let’s now rerun the container… you should realise that your script is gone! This is the first lesson: whatever you do inside a container will disappear once the container is stopped. This also means that if you install the R packages that you need while the container is running, you will need to reinstall them every time.

Thankfully, the Rocker Project provides a list of images with many packages already available. For example to run R with the {tidyverse} collection of packages already pre-installed, run:

docker run --rm -e PASSWORD=yourpassword -p 8787:8787 rocker/tidyverse

7.2.3 Basic Docker Workflow

You already know about running containers using docker run. With the commands we ran before, your terminal will need to stay open, or else, the container will stop. Starting now, we will run Docker commands in the background using the -d flag (d as in detach):

docker run --rm -d -e PASSWORD=yourpassword -p 8787:8787 rocker/tidyverse

You can run several containers in the background simultaneously. List running containers with docker ps:

docker ps
CONTAINER ID   IMAGE              COMMAND   CREATED         STATUS         PORTS                    NAMES
c956fbeeebcb   rocker/tidyverse   "/init"   3 minutes ago   Up 3 minutes   0.0.0.0:8787->8787/tcp   elastic_morse

Stop the container using its ID:

docker stop c956fbeeebcb

Let’s discuss the other flags:

  • --rm: Removes the container once it’s stopped
  • -e: Provides environment variables to the container (e.g., PASSWORD)
  • -p: Sets the port mapping (host:container)
  • --name: Gives the container a custom name

Run a container with a name:

docker run -d --name my_r --rm -e PASSWORD=yourpassword -p 8787:8787 rocker/tidyverse

You can now interact with this container using its name:

docker exec -ti my_r bash

You are now inside a terminal session, inside the running container! This can be useful for debugging purposes.

Finally, let’s solve the issue of our scripts disappearing. Create a folder somewhere on your computer, then run:

docker run -d --name my_r --rm -e PASSWORD=yourpassword -p 8787:8787 \
  -v /path/to/your/local/folder:/home/rstudio/scripts:rw rocker/tidyverse

You should now be able to save scripts inside the scripts/ folder from RStudio and they will appear in the folder you created on your host machine.

7.3 Making Our Own Images

To create our own images, you can start from an image provided by an open source project like Rocker, or you can start from the base Ubuntu image. Since we are using Nix to set up the reproducible development environment, we can use ubuntu:latest. Our development environment will always be exactly the same, thanks to Nix.

7.3.1 A Minimal Dockerfile with Nix

FROM ubuntu:latest

RUN apt update -y
RUN apt install curl -y

# Download the default.nix that comes with {rix}
RUN curl -O https://raw.githubusercontent.com/ropensci/rix/main/inst/extdata/default.nix

# Install Nix via Determinate Systems installer
RUN curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install linux \
  --extra-conf "sandbox = false" \
  --init none \
  --no-confirm

# Add Nix to the path
ENV PATH="${PATH}:/nix/var/nix/profiles/default/bin"
ENV user=root

# Configure rstats-on-nix cache
RUN mkdir -p /root/.config/nix && \
    echo "substituters = https://cache.nixos.org https://rstats-on-nix.cachix.org" > /root/.config/nix/nix.conf && \
    echo "trusted-public-keys = cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY= rstats-on-nix.cachix.org-1:vdiiVgocg6WeJrODIqdprZRUrhi1JzhBnXv7aWI6+F0=" >> /root/.config/nix/nix.conf

# Copy script to generate environment
COPY gen-env.R .

# Generate default.nix from gen-env.R
RUN nix-shell --run "Rscript gen-env.R"

# Build the environment
RUN nix-build

# Run nix-shell when container starts
CMD nix-shell

Every Dockerfile starts with a FROM statement specifying the base image. Then, every command starts with RUN. We install and configure Nix, copy an R script to generate the environment, and then build it. Finally, we run nix-shell when executing a container (the CMD statement).

7.3.2 Splitting Into a Reusable Base Image

Because the Nix installation step is generic, we can split this into two stages. First, create a base image with just Nix installed:

# nix-base/Dockerfile
FROM ubuntu:latest AS nix-base

RUN apt update -y && apt install -y curl

# Install Nix via Determinate Systems installer
RUN curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install linux \
  --extra-conf "sandbox = false" \
  --init none \
  --no-confirm

ENV PATH="/nix/var/nix/profiles/default/bin:${PATH}"
ENV user=root

# Configure Nix binary cache
RUN mkdir -p /root/.config/nix && \
    echo "substituters = https://cache.nixos.org https://rstats-on-nix.cachix.org" > /root/.config/nix/nix.conf && \
    echo "trusted-public-keys = cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY= rstats-on-nix.cachix.org-1:vdiiVgocg6WeJrODIqdprZRUrhi1JzhBnXv7aWI6+F0=" >> /root/.config/nix/nix.conf

Build and tag this image:

docker build -t nix-base:latest .

Now, for any project, simply reuse it:

FROM nix-base:latest

COPY gen-env.R .

RUN curl -O https://raw.githubusercontent.com/ropensci/rix/main/inst/extdata/default.nix
RUN nix-shell --run "Rscript gen-env.R"
RUN nix-build

CMD ["nix-shell"]

With gen-env.R:

library(rix)

rix(
  date = "2025-08-04",
  r_pkgs = c("dplyr", "ggplot2"),
  py_conf = list(
    py_version = "3.13",
    py_pkgs = c("polars", "great-tables")
  ),
  ide = "none",
  project_path = ".",
  overwrite = TRUE
)

Build and run:

docker build -t my-project .
docker run -it --rm --name my-project-container my-project

This drops you in an interactive Nix shell running inside Docker!

7.4 Publishing Images on Docker Hub

If you want to share Docker images through Docker Hub, you first need to create a free account. A free account gives you unlimited public repositories.

List all images on your computer:

docker images

Log in to Docker Hub:

docker login

Tag the image with your username:

docker tag IMAGE_ID your_username/nix-base:latest

Push the image:

docker push your_username/nix-base:latest

This image can now be used as a stable base for other projects:

FROM your_username/nix-base:latest

RUN mkdir ...

7.4.1 Sharing Without Docker Hub

If you can’t upload to Docker Hub, you can save the image to a file:

docker save nix-base | gzip > nix-base.tgz

Load it on another machine:

gzip -d nix-base.tgz
docker load < nix-base.tar

7.5 What If You Don’t Use Nix?

Using Nix inside of Docker makes it very easy to set up an environment, but what if you can’t use Nix for some reason? In this case, you would need to use other tools to install the right R or Python packages and it is likely going to be more difficult. The main issue you will face is missing development libraries.

For example, to install and use the R {stringr} package, you will need to first install libicu-dev:

FROM rocker/r-ver:4.5.1

RUN apt-get update && apt-get install -y \
    libglpk-dev \
    libxml2-dev \
    libcairo2-dev \
    libgit2-dev \
    libcurl4-openssl-dev \
    # ... many more

Another issue is that building the image is not a reproducible process, only running containers is. To mitigate this, use tagged images or better yet, a digest:

FROM rocker/r-ver@sha256:1dbe7a6718b7bd8630addc45a32731624fb7b7ffa08c0b5b91959b0dbf7ba88e

This will always pull exactly the same layers. However, at some point, that version of Ubuntu will be outdated. Using Nix, you can stay on ubuntu:latest.

7.6 Building Docker Images Directly with Nix

So far, we’ve used Nix inside Docker containers. But there’s an even more powerful approach: using Nix to build Docker images directly, bypassing Dockerfiles entirely.

Nix provides dockerTools, a set of functions that can create OCI-compliant container images. Because these images are built through Nix’s deterministic build system, they have stronger reproducibility guarantees than images built with docker build.

7.6.1 Why Skip the Dockerfile?

A Dockerfile is fundamentally imperative: it’s a script of commands executed in order. Commands like apt-get update are non-deterministic by nature. Even with careful pinning, you’re fighting the tool’s design.

Nix’s dockerTools.buildImage is declarative. You describe what should be in the image, and Nix figures out how to build it reproducibly. The resulting image is a pure function of its inputs.

7.6.2 A Basic Example

Here’s a Nix expression that builds a Docker image containing R and some packages:

# docker-image.nix
let
  pkgs = import (fetchTarball
    "https://github.com/rstats-on-nix/nixpkgs/archive/2025-10-14.tar.gz"
  ) {};
  
  # Define the R environment
  myR = pkgs.rWrapper.override {
    packages = with pkgs.rPackages; [
      dplyr
      ggplot2
      quarto
    ];
  };
  
in pkgs.dockerTools.buildImage {
  name = "my-r-env";
  tag = "latest";
  
  copyToRoot = pkgs.buildEnv {
    name = "image-root";
    paths = [ myR pkgs.quarto pkgs.coreutils pkgs.bash ];
    pathsToLink = [ "/bin" ];
  };
  
  config = {
    Cmd = [ "${pkgs.bash}/bin/bash" ];
    WorkingDir = "/work";
  };
}

Build and load it:

# Build the image (outputs a .tar.gz file)
nix-build docker-image.nix

# Load into Docker
docker load < result

# Run it
docker run -it --rm my-r-env:latest

7.6.3 Using rix with dockerTools

The previous example manually defines the R environment in Nix. If you prefer to use {rix} to generate your environment (as we’ve done throughout this book), you can import the generated default.nix into your Docker image definition.

First, create your environment with {rix} as usual:

library(rix)

rix(
  date = "2025-10-14",
  r_pkgs = c("dplyr", "ggplot2", "quarto"),
  ide = "none",
  project_path = ".",
  overwrite = TRUE
)

This generates a default.nix. Now create a docker-image.nix that imports it:

# docker-image.nix
let
  pkgs = import (fetchTarball
    "https://github.com/rstats-on-nix/nixpkgs/archive/2025-10-14.tar.gz"
  ) {};
  
  # Import the shell environment generated by rix
  rixEnv = import ./default.nix;
  
in pkgs.dockerTools.buildImage {
  name = "my-rix-project";
  tag = "latest";
  
  copyToRoot = pkgs.buildEnv {
    name = "image-root";
    paths = rixEnv.buildInputs ++ [ pkgs.coreutils pkgs.bash ];
    pathsToLink = [ "/bin" ];
  };
  
  config = {
    Cmd = [ "${pkgs.bash}/bin/bash" ];
    WorkingDir = "/work";
  };
}

This approach lets you keep your familiar {rix} workflow for defining environments while gaining the reproducibility benefits of dockerTools for image creation. Any changes to your gen-env.Rdefault.nix will automatically propagate to the Docker image on the next build.

7.6.4 Benefits Over Dockerfiles

Aspect Dockerfile Nix dockerTools
Reproducibility Imperative, non-deterministic Declarative, deterministic
Layer caching Based on command order Based on content hashes
Image size Often includes unnecessary packages Minimal: only what you specify
Composition Copy-paste between files Reuse Nix expressions

7.6.5 Layered Images for Efficiency

For faster CI builds, you can create layered images where the base environment is cached:

pkgs.dockerTools.buildLayeredImage {
  name = "my-analysis";
  tag = "latest";
  
  contents = [ myR pkgs.quarto ];
  
  config = {
    Cmd = [ "${myR}/bin/R" "--vanilla" ];
  };
}

buildLayeredImage creates separate layers for each package, so unchanged dependencies are cached between builds.

7.6.6 When to Use This Approach

  • CI/CD pipelines: When you need reproducible image builds as part of automated workflows
  • Minimal images: When image size matters and you want only what you need
  • Complex environments: When the environment is already defined in Nix and you want to deploy it as a container

For simpler use cases, the “Nix inside Docker” approach from earlier sections may be more accessible.

7.7 Dockerising a rixpress Pipeline

We can package our entire {rixpress} project into a single Docker image. This image can then be run by anyone with Docker installed, regardless of their host operating system or whether they have Nix.

Assume your project directory has:

.
├── data/
│   └── mtcars.csv
├── gen-env.R
├── gen-pipeline.R
├── functions.R
├── functions.py
├── default.nix
└── pipeline.nix

7.7.1 Step 1: The Dockerfile

FROM nix-base:latest
# Or: FROM your-username/nix-base:latest

WORKDIR /app

# Copy all project files
COPY . .

# Build the pipeline during image build
RUN nix-build pipeline.nix

# Export results when container runs
COPY export-results.R .
CMD ["nix-shell", "--run", "Rscript export-results.R"]

7.7.2 Step 2: The Export Script

# export-results.R
library(rixpress)
library(jsonlite)

output_dir <- "/output"
dir.create(output_dir, showWarnings = FALSE)

message("Reading target 'mtcars_head'...")
final_data <- rxp_read("mtcars_head")

output_path <- file.path(output_dir, "mtcars_analysis_result.json")
write_json(final_data, output_path, pretty = TRUE)

message(paste("Successfully exported result to", output_path))

7.7.3 Step 3: Build and Run

Build:

docker build -t my-reproducible-pipeline .

Run to get results:

mkdir -p ./output

docker run --rm --name my_pipeline_run \
  -v "$(pwd)/output":/output \
  my-reproducible-pipeline

Check your local output directory. You’ll find the mtcars_analysis_result.json file containing the exact, reproducible result of your pipeline.

You have successfully packaged a complex, polyglot pipeline into a simple, portable Docker image. This workflow combines the best of both worlds: Nix’s power for creating reproducible builds and Docker’s universal standard for distributing and running applications.

7.7.4 Alternative: Using dockerTools for rixpress

You can also build your rixpress pipeline image purely with Nix, avoiding Dockerfiles entirely. This gives you fully deterministic image builds.

Create a docker-pipeline.nix:

# docker-pipeline.nix
let
  pkgs = import (fetchTarball
    "https://github.com/rstats-on-nix/nixpkgs/archive/2025-10-14.tar.gz"
  ) {};
  
  # Import your rix-generated environment
  rixEnv = import ./default.nix;
  
  # Build the pipeline as a derivation
  pipelineResult = pkgs.runCommand "pipeline-result" {
    buildInputs = rixEnv.buildInputs;
  } ''
    mkdir -p $out
    cd ${./. }
    Rscript -e "source('gen-pipeline.R')"
    # Copy results to output
    cp -r _rixpress $out/
  '';
  
in pkgs.dockerTools.buildImage {
  name = "my-rixpress-pipeline";
  tag = "latest";
  
  copyToRoot = pkgs.buildEnv {
    name = "image-root";
    paths = [ 
      rixEnv.buildInputs 
      pipelineResult 
      pkgs.coreutils 
      pkgs.bash 
    ];
    pathsToLink = [ "/bin" "/" ];
  };
  
  config = {
    Cmd = [ "${pkgs.bash}/bin/bash" "-c" "cp -r /pipeline-result/* /output/" ];
    WorkingDir = "/work";
  };
}

Build and run:

nix-build docker-pipeline.nix
docker load < result
docker run --rm -v "$(pwd)/output":/output my-rixpress-pipeline:latest

This approach bakes the pipeline results directly into the image during the Nix build phase. The resulting image is minimal and contains only the outputs, not the full R/Python environment needed to produce them.

7.8 Summary

Docker and Nix complement each other:

  • Docker provides a universal runtime and distribution mechanism
  • Nix provides bit-for-bit reproducible environment management
  • Together they enable truly reproducible, portable data products

Key Docker concepts:

  • Images are templates; containers are running instances
  • Containers are ephemeral; use volumes for persistence
  • Use registries (Docker Hub) to share images
  • The Rocker Project provides ready-made R images

For reproducibility:

  • Use ubuntu:latest + Nix rather than pinning specific OS versions
  • Build a reusable nix-base image to share across projects
  • Package {rixpress} pipelines for distribution

7.9 Further Reading