Jupyter notebook development workspace using Docker, Docker Compose and Git

Posted by Harald Nezbeda on Wed 19 April 2023

Jupyter notebooks have become a popular tool in the data science community, particularly for those who use Python. The setup is quite simple when the operating system has a Python runtime configured, and notebooks can be shared from one user to another. However, the process has several flaws: - Notebooks may become outdated or diverge if multiple people are involved - Missing dependencies can cause issues when running the cells - Cells might not behave correctly or raise exceptions due to different Python versions

In the following, I propose a minimal setup for a local development environment using git, docker, and docker-compose, which will address the issues described above.

The code is also available at https://github.com/nezhar/jupyter-docker-compose

Minimal Setup

The minimal setup to run Jupyter can look like this:

version: '3'
services:
  jupyter:
    image: jupyter/minimal-notebook
    volumes:
      - ./work:/home/jovyan/work
    ports:
      - 8888:8888
    container_name: jupyter_notebook

This already allows you to start the jupyter service and access the Web GUI in the browser via localhost:8888. Make sure to copy the token that will be output in the console when running docker-compose up, as it is required to log in.

Extended setup

First, I want to remove the login token. While this may be ideal for a production environment, we are working with a local development environment, so we can go ahead and disable it by extending the start command: command: "start-notebook.sh --NotebookApp.token="

The second issue we need to address is the Python modules required to run the code in a notebook. We can solve this by creating a requirements.txt file, where all the dependencies are specified. Here is an example file:

numpy==1.24.2
pandas==2.0.0
matplotlib==3.7.1

A custom Dockerfile is required to use the file with pip and create a new container image where all dependencies are available:

FROM jupyter/minimal-notebook

COPY ./requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt

The docker-compose.yml will slightly change:

version: '3'
services:
  jupyter:
    build: ./docker/jupyter
    volumes:
      - ./work:/home/jovyan/work
    ports:
      - 8888:8888
    container_name: jupyter_notebook
    command: "start-notebook.sh --NotebookApp.token="

The final step is to add a .gitignore to exclude the checkpoint files

.ipynb_checkpoints

Conclusion

By utilizing a combination of git, docker, and docker-compose, a local development environment for Jupyter notebooks has been established that addresses the common issues of outdated notebooks, missing dependencies, and varying Python versions.

The login token has been removed for improved local usability, and a requirements.txt file has been included to manage the necessary Python modules. This streamlined and consistent environment facilitates easier collaboration using git workflows, reduces potential errors from outdated dependencies, and enhances the overall experience for individuals working with Jupyter notebooks.