Introduction

Creating Docker images from scratch can be time and labor consuming. Fortunately, many pre-built and regularly updated Docker images for the R community are ready for use, especially when creating your own containerized R Markdown documents with liftr.

Such sources of pre-built Docker images include the rocker project and Bioconductor Docker containers. In this article, we will use the tidyverse image provided by rocker. This image includes the essential tidyverse packages and devtools environment loved by many data scientists (Wickham 2014). We will demonstrate how to containerize and render your tidyverse-heavy R Markdown document using Docker in only a few minutes.

Install Docker

If Docker has not been installed on your system, please use install_docker() and follow the guidelines to install it. After that, check_docker_install() and check_docker_running() would help you make sure that Docker has been installed and running properly.

Example document

Let’s create a new folder first and copy the example R Markdown document to this folder:

path = paste0("~/liftr-tidyverse/")

dir.create(path)
file.copy(system.file("examples/liftr-tidyverse.Rmd", package = "liftr"), path)

input = paste0(path, "liftr-tidyverse.Rmd")

If we open the R Markdown file, we will see the header section includes a liftr section, which defines the Docker system environment required to render this document. For our case, it is very straightforward and simple indeed:

---
title: "Explore tidyverse with liftr"
author: "Nan Xiao <<me@nanx.me>>"
date: "2017-12-13"
output:
  rmarkdown::pdf_document:
    toc: true
    number_sections: true
liftr:
  from: "rocker/tidyverse:latest"
  maintainer: "Nan Xiao"
  email: "me@nanx.me"
  pandoc: false
  texlive: true
  cran:
    - nycflights13
---

Most of the fields are self-explanatory:

  • Here we simply specified the latest rocker/tidyverse image as our base image, which would save us a lot of time creating a custom base image with all the tidyverse dependencies.
  • The custom pandoc installation was not included because the tidyverse image already includes pandoc.
  • We included TeXLive here since we intend to render a PDF file in the end.
  • The CRAN data package nycflights13 will be installed.

Containerize the document

Let’s containerize this document by generating a Dockerfile for it, using liftr::lift:

lift(input)

A file named Dockerfile will be generated under the same directory of the input RMD file. It contains the necessary commands for building the Docker container for rendering the document.

Render the document

We can use render_docker() to start the Docker container, and render the document inside it:

Let’s view the rendered document:

browseURL(paste0(path, "liftr-tidyverse.pdf"))

In the last section of the rendered PDF, we will see that the session information are probably different with your current system’s information. Yes, that is because the document is completed generated by a newly built, isolated Linux system environment, using Docker.

In this way, the R Markdown document gains a higher, system level reproducibility, thus easily replicable by other users who might not have the identical system and R package environment to yours. This is a good thing for team collaboration and large-scale document orchestration. The best part is, all you need to share is still the document itself, only with a few extra metadata fields.

Housekeeping

The Docker images stored in your system could take a few gigabytes and get larger gradually as you build more images. Let’s remove the generated Docker image to save some disk space:

prune_image(paste0(path, "liftr-tidyverse.docker.yml"))

If we do this, the Docker container will be rebuilt next time when you use render_docker(). If not, the image will be cached in the system and reused when compiling the document later and save some time for you.

References

Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software 59 (10): 1–23.