Over the past couple of years, I have started testing several open-source applications and ended up using many of them in the long term. To keep things simple and isolated, I decided to use Docker for running the applications, in conjunction with all the required services. Each configuration is written in a docker-compose.yml
file.
Having these self-hosted services provides full control over your data and also helps you save a lot of money that would otherwise be spent on subscriptions. However, the downside is that you have full operational responsibility. This includes monitoring your system, ensuring that it is updated and secure. It also leaves you with the responsibility of making sure you don't lose data if something bad happens.
This article is about how you can create a system that helps you backup data and containers so you can either restore them on the same or a new server if required.
Contents
- Identify application and data
- Collect all data into a single folder
- Compress the folder
Identify application and data
The first step is to find out how the application stores its data and make sure the containers are configured the right way.
Here is a demo application configuration that we will use in this tutorial:
version: "3"
volumes:
db_data:
services:
db:
image: postgres
volumes:
- db_data:/var/lib/postgresql/data
environment:
- POSTGRES_DB=database
- POSTGRES_USER=db-user
- POSTGRES_PASSWORD=db-pass
api:
image: ${API_REMOTE}:${API_VERSION}
build:
context: ./backend/
args:
GIT_HASH: "${API_VERSION}"
command: gunicorn --bind 0.0.0.0:8000 snypy.wsgi --log-level info
environment:
DEBUG: "False"
SECRET_KEY: "secret"
ALLOWED_HOSTS: "localhost"
DATABASE_URL: "psql://db-user:db-pass@db:5432/database"
EMAIL_URL: "smtp://user:pass@mailserver:25"
CORS_ORIGIN_WHITELIST: "http://localhost:8080"
CSRF_TRUSTED_ORIGINS: "http://localhost"
REGISTER_VERIFICATION_URL: "http://localhost:8080/verify-user/"
RESET_PASSWORD_VERIFICATION_URL: "http://localhost:8080/set-password/?token={token}"
REGISTER_EMAIL_VERIFICATION_URL: "http://localhost:8080/verify-email/"
depends_on:
- db
static:
image: ${STATIC_REMOTE}:${API_VERSION}
build:
context: ./static/
args:
GIT_HASH: "${API_VERSION}"
ui:
image: ${UI_REMOTE}:${UI_VERSION}
build:
context: ./frontend/
args:
GIT_HASH: "${UI_VERSION}"
environment:
REST_API_URL: "http://localhost:8000"
Here is also the configuration from the .env
file:
API_VERSION=1.3.0
API_REMOTE=ghcr.io/snypy/snypy-backend
STATIC_REMOTE=ghcr.io/snypy/snypy-static
UI_VERSION=1.3.2
UI_REMOTE=ghcr.io/snypy/snypy-frontend
There are 4 services starting with this configuration:
-
db
- usespostgres
and will default tolatest
if no version is specified. The data is stored in a volume calleddb_data
. -
api
- uses an image specified by the environment variablesAPI_REMOTE
andAPI_VERSION
. The application data is stored in the PostgreSQL database running in thedb
service. -
static
- uses an image specified by the environment variablesSTATIC_REMOTE
andAPI_VERSION
. This service serves static files like CSS and JavaScript. These files are used mostly in the context of the admin site for the backend. -
ui
- uses an image specified by the environment variablesUI_REMOTE
andUI_VERSION
. This service contains the single-page application (SPA) that communicates with the API running in theapi
service from the client's browser.
The important persistent application data in this configuration is stored in the volume called db_data
, which is used by the PostgreSQL database running in the db
service to store all database data. It's important to make sure this volume is configured correctly so that backups can be taken as needed.
Collect all data into a single folder
Now that the application and the data have been identified, we can start collecting the data for the backup. The following needs to be exported:
- Volume data: Since the volume is used by a PostgreSQL database, I will create a database dump. For other applications, it might be necessary to export the entire volume or create a copy of all the files in the backup folder when using bind mounts.
- Command to create database dump:
docker-compose exec db sh -c "pg_dump -U db-user database" > data_$(date +"%m_%d_%Y").sql
- Command to create database dump:
- Service images: This is even more important for images that are built locally, but also applies to images taken from a remote, as tags can be replaced in a Docker registry. The best example is the
latest
tag, which is commonly set to the latest stable release. It is also common for images to be fetched only by the indication of a major release (examplepostgres:13
), but this can also be replaced when a new minor or patch version is released. The main point is that the image you take might not be the same, and your Docker Compose setup can be broken or not 100% the same as it was when taking the backup.- Command to export image:
docker save --output BACKUP_FOLDER_NAME IMAGE_ID
- Command to export image metadata:
docker image inspect IMAGE_ID > BACKUP_FOLDER_NAME/IMAGE_ID.json
- Command to export image:
- Service container: Similar to the images, the containers can also be exported. This depends on the application; if you are 100% sure the application can be restored by using the database, this step can be skipped:
- Command to export container:
docker export --output BACKUP_FOLDER_NAME CONTAINER_ID
- Command to export container metadata:
docker container inspect CONTAINER_ID > BACKUP_FOLDER_NAME/CONTAINER_ID.json
- Command to export container:
Here is how a bash script might look like:
mkdir backup
echo "Creating database dump"
docker-compose exec db sh -c "pg_dump -U db-user database" > backup/data_$(date +"%m_%d_%Y").sql
for service_name in $(docker-compose config --services); do
image_id=$(docker-compose images -q "$service_name")
container_id=$(docker-compose ps -q "$service_name")
echo "Procesing service $service_name"
mkdir -p "backup/images/$service_name/"
echo "Saving image with id $image_id"
docker save --output "backup/images/$service_name/$image_id.tar" "$image_id"
docker image inspect "$image_id" > "backup/images/$service_name/$image_id.json"
mkdir -p "backup/containers/$service_name"
echo "Exporting container with id $container_id"
docker export --output "backup/containers/$service_name/$container_id.tar" "$container_id"
docker container inspect "$container_id" > "backup/containers/$service_name/$container_id.json"
done
This script will create a backup folder and export the necessary data for the specified services, including volume data (sql dump), service images, and service containers.
Compress the folder
Finally, we want to compress the folder so that it can be easily transferred or stored in a secure location. There are several compression algorithms available, but gzip is a popular choice.
Here is how to compress the folder using gzip:
tar -zcvf BACKUP_FOLDER_NAME.tar.gz BACKUP_FOLDER_NAME
This will create a compressed file called BACKUP_FOLDER_NAME.tar.gz
containing all the data we collected earlier.
Conclusion
Creating backups for applications running in Docker containers can be a bit daunting at first, but with a little effort, it can be automated and become part of your regular maintenance routine. In this article, we identified the important data and services that need to be backed up, collected them into a single folder, and compressed it for easy storage and transfer.
Make sure to test your backups regularly to ensure they are working correctly and you can restore your services if needed.
Once all this is in place you may also consider to cleanup the docker service in order to free up memory