Jupyter notebooks have become a popular tool in the data science community, particularly for those who use Python. The setup is quite simple when the operating system has a Python runtime configured, and notebooks can be shared from one user to another. However, the process has several flaws: - Notebooks may become outdated or diverge if multiple people are involved - Missing dependencies can cause issues when running the cells - Cells might not behave correctly or raise exceptions due to different Python versions
In the following, I propose a minimal setup for a local development environment using git
, docker
, and docker-compose
, which will address the issues described above.
The code is also available at https://github.com/nezhar/jupyter-docker-compose
Minimal Setup
The minimal setup to run Jupyter can look like this:
version: '3'
services:
jupyter:
image: jupyter/minimal-notebook
volumes:
- ./work:/home/jovyan/work
ports:
- 8888:8888
container_name: jupyter_notebook
This already allows you to start the jupyter
service and access the Web GUI in the browser via localhost:8888
. Make sure to copy the token that will be output in the console when running docker-compose up
, as it is required to log in.
Extended setup
First, I want to remove the login token. While this may be ideal for a production environment, we are working with a local development environment, so we can go ahead and disable it by extending the start command: command: "start-notebook.sh --NotebookApp.token="
The second issue we need to address is the Python modules required to run the code in a notebook. We can solve this by creating a requirements.txt
file, where all the dependencies are specified. Here is an example file:
numpy==1.24.2
pandas==2.0.0
matplotlib==3.7.1
A custom Dockerfile
is required to use the file with pip and create a new container image where all dependencies are available:
FROM jupyter/minimal-notebook
COPY ./requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
The docker-compose.yml
will slightly change:
version: '3'
services:
jupyter:
build: ./docker/jupyter
volumes:
- ./work:/home/jovyan/work
ports:
- 8888:8888
container_name: jupyter_notebook
command: "start-notebook.sh --NotebookApp.token="
The final step is to add a .gitignore
to exclude the checkpoint files
.ipynb_checkpoints
Conclusion
By utilizing a combination of git
, docker
, and docker-compose
, a local development environment for Jupyter notebooks has been established that addresses the common issues of outdated notebooks, missing dependencies, and varying Python versions.
The login token has been removed for improved local usability, and a requirements.txt
file has been included to manage the necessary Python modules. This streamlined and consistent environment facilitates easier collaboration using git
workflows, reduces potential errors from outdated dependencies, and enhances the overall experience for individuals working with Jupyter notebooks.