Backup applications running in docker containers

Posted by Harald Nezbeda on Mon 10 April 2023

Over the past couple of years, I have started testing several open-source applications and ended up using many of them in the long term. To keep things simple and isolated, I decided to use Docker for running the applications, in conjunction with all the required services. Each configuration is written in a docker-compose.yml file.

Having these self-hosted services provides full control over your data and also helps you save a lot of money that would otherwise be spent on subscriptions. However, the downside is that you have full operational responsibility. This includes monitoring your system, ensuring that it is updated and secure. It also leaves you with the responsibility of making sure you don't lose data if something bad happens.

This article is about how you can create a system that helps you backup data and containers so you can either restore them on the same or a new server if required.

Contents

  1. Identify application and data
  2. Collect all data into a single folder
  3. Compress the folder

Identify application and data

The first step is to find out how the application stores its data and make sure the containers are configured the right way.

Here is a demo application configuration that we will use in this tutorial:

version: "3"

volumes:
  db_data:

services:
  db:
    image: postgres
    volumes:
      - db_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_DB=database
      - POSTGRES_USER=db-user
      - POSTGRES_PASSWORD=db-pass

  api:
    image: ${API_REMOTE}:${API_VERSION}
    build:
      context: ./backend/
      args:
        GIT_HASH: "${API_VERSION}"
    command: gunicorn --bind 0.0.0.0:8000 snypy.wsgi --log-level info
    environment:
      DEBUG: "False"
      SECRET_KEY: "secret"
      ALLOWED_HOSTS: "localhost"
      DATABASE_URL: "psql://db-user:db-pass@db:5432/database"
      EMAIL_URL: "smtp://user:pass@mailserver:25"
      CORS_ORIGIN_WHITELIST: "http://localhost:8080"
      CSRF_TRUSTED_ORIGINS: "http://localhost"
      REGISTER_VERIFICATION_URL: "http://localhost:8080/verify-user/"
      RESET_PASSWORD_VERIFICATION_URL: "http://localhost:8080/set-password/?token={token}"
      REGISTER_EMAIL_VERIFICATION_URL: "http://localhost:8080/verify-email/"
    depends_on:
      - db

  static:
    image: ${STATIC_REMOTE}:${API_VERSION}
    build:
      context: ./static/
      args:
        GIT_HASH: "${API_VERSION}"

  ui:
    image: ${UI_REMOTE}:${UI_VERSION}
    build:
      context: ./frontend/
      args:
        GIT_HASH: "${UI_VERSION}"
    environment:
      REST_API_URL: "http://localhost:8000"

Here is also the configuration from the .env file:

API_VERSION=1.3.0
API_REMOTE=ghcr.io/snypy/snypy-backend
STATIC_REMOTE=ghcr.io/snypy/snypy-static

UI_VERSION=1.3.2
UI_REMOTE=ghcr.io/snypy/snypy-frontend

There are 4 services starting with this configuration:

  1. db - uses postgres and will default to latest if no version is specified. The data is stored in a volume called db_data.

  2. api - uses an image specified by the environment variables API_REMOTE and API_VERSION. The application data is stored in the PostgreSQL database running in the db service.

  3. static - uses an image specified by the environment variables STATIC_REMOTE and API_VERSION. This service serves static files like CSS and JavaScript. These files are used mostly in the context of the admin site for the backend.

  4. ui - uses an image specified by the environment variables UI_REMOTE and UI_VERSION. This service contains the single-page application (SPA) that communicates with the API running in the api service from the client's browser.

The important persistent application data in this configuration is stored in the volume called db_data, which is used by the PostgreSQL database running in the db service to store all database data. It's important to make sure this volume is configured correctly so that backups can be taken as needed.

Collect all data into a single folder

Now that the application and the data have been identified, we can start collecting the data for the backup. The following needs to be exported:

  • Volume data: Since the volume is used by a PostgreSQL database, I will create a database dump. For other applications, it might be necessary to export the entire volume or create a copy of all the files in the backup folder when using bind mounts.
    • Command to create database dump: docker-compose exec db sh -c "pg_dump -U db-user database" > data_$(date +"%m_%d_%Y").sql
  • Service images: This is even more important for images that are built locally, but also applies to images taken from a remote, as tags can be replaced in a Docker registry. The best example is the latest tag, which is commonly set to the latest stable release. It is also common for images to be fetched only by the indication of a major release (example postgres:13), but this can also be replaced when a new minor or patch version is released. The main point is that the image you take might not be the same, and your Docker Compose setup can be broken or not 100% the same as it was when taking the backup.
    • Command to export image: docker save --output BACKUP_FOLDER_NAME IMAGE_ID
    • Command to export image metadata: docker image inspect IMAGE_ID > BACKUP_FOLDER_NAME/IMAGE_ID.json
  • Service container: Similar to the images, the containers can also be exported. This depends on the application; if you are 100% sure the application can be restored by using the database, this step can be skipped:
    • Command to export container: docker export --output BACKUP_FOLDER_NAME CONTAINER_ID
    • Command to export container metadata: docker container inspect CONTAINER_ID > BACKUP_FOLDER_NAME/CONTAINER_ID.json

Here is how a bash script might look like:

mkdir backup

echo "Creating database dump"
docker-compose exec db sh -c "pg_dump -U db-user database" > backup/data_$(date +"%m_%d_%Y").sql

for service_name in $(docker-compose config --services); do
    image_id=$(docker-compose images -q "$service_name")
    container_id=$(docker-compose ps -q "$service_name")

    echo "Procesing service $service_name"

    mkdir -p "backup/images/$service_name/"
    echo "Saving image with id $image_id"
    docker save --output "backup/images/$service_name/$image_id.tar" "$image_id"
    docker image inspect "$image_id" > "backup/images/$service_name/$image_id.json"

    mkdir -p "backup/containers/$service_name"
    echo "Exporting container with id $container_id"
    docker export --output "backup/containers/$service_name/$container_id.tar" "$container_id"
    docker container inspect "$container_id" > "backup/containers/$service_name/$container_id.json"
done

This script will create a backup folder and export the necessary data for the specified services, including volume data (sql dump), service images, and service containers.

Compress the folder

Finally, we want to compress the folder so that it can be easily transferred or stored in a secure location. There are several compression algorithms available, but gzip is a popular choice.

Here is how to compress the folder using gzip:

tar -zcvf BACKUP_FOLDER_NAME.tar.gz BACKUP_FOLDER_NAME

This will create a compressed file called BACKUP_FOLDER_NAME.tar.gz containing all the data we collected earlier.

Conclusion

Creating backups for applications running in Docker containers can be a bit daunting at first, but with a little effort, it can be automated and become part of your regular maintenance routine. In this article, we identified the important data and services that need to be backed up, collected them into a single folder, and compressed it for easy storage and transfer.

Make sure to test your backups regularly to ensure they are working correctly and you can restore your services if needed.

Once all this is in place you may also consider to cleanup the docker service in order to free up memory