Honing the craft.

You are here: You're reading a post

Automated PyPI releasing with GitHub Actions for Dup-composer

This is what we have on the plate for Dup-composer this week: GitHub Actions workflow is implemented for PyPI releases, new features, including full backup frequency configuration, forced full backups, path filtering aka. include and exclude options. I also made my first production test backups and want to share with you how to run Dup-composer in a Docker container using a simple image that I have built.

This week was one of the good ones; I am pretty satisfied with my progress. I managed to check all the items on my to do list from last week, plus did some extra work too.

GitHub Actions workflow for PyPI release

Implementing the PyPI release workflow for GitHub Actions was straightforward using the Python Packaging Guide, I only had to do a few tweaks in the workflow file so that it suits my package setup. One of the differences is, that I am using setuptools and not pep517 to build my package. Although this and other differences weren't huge, I have made some indentation mistakes while typing the workflow in, moreover, I had to fine tune the changes I made and it required a few takes to get things rolling.

This trial and error stage was somewhat tedious as GitHub Actions doesn't have an out of the box way to trigger workflows manually from the GitHub UI for instance. Since my workflow triggers on push, I had to make several corrective commits and pushes until a workflow run completed successfully. This is less than ideal and an option to manually trigger the workflow would be very desired in my opinion, especially when you are still working to get your workflow right and don't want to clutter your commit log. Apparently there are some options using the HTTP API but nothing on the UI; there is a discussion on the GitHub Community Forum about the lack of this feature, you might want to read this thread if you need custom manual triggers for your workflow. GitLab supports this by the way.

All my pushes to GitHub trigger the workflow at the moment and a release is made to Test PyPI each time. As a result, if I forget to bump the dev version in setup.py, or I simply wish to edit README.md on the GitHub UI, the workflow will fail, as the given version already exists on Test PyPI. The releases to the production PyPI are governed in the workflow by version tags indicating production releases. I haven't tried this scenario yet, but I want to take a closer look at my versioning and release strategy before I do that, namely:

  • Determine a versioning scheme for the dev and production releases.
  • Determine if i still want to trigger the Test PyPI publish on each push or perhaps tag the commits intended for a test release.
  • Automate version bumps at least partially.

For now, I have delayed the next official release to PyPI until these items are sorted out.

Controlling full backups

Adding the ability to configure full backup frequency was one of the features I wanted to add for a while, but never got to it until now. This was issue #2, opened on this project back in August. The feature wasn't particularly hard to implement, just some other features were more exciting to work on, so I picked those first. :)

How can you configure the frequency of full backups? For instance, to have a full backup every month, define the following in the group configuration:

full_backup_frequency: 1M

The configuration value is passed to Duplicity using the --full-if-older-than command line option, please consult the TIME FORMATS section of the Duplicity man page for valid time values.

You can also trigger on demand full backups for all the backup groups executed for the given run using the -f option. I strive to add command line options to Dup-composer sparingly, the goal of the tool is to have a simple method of execution, while most of the complexity lives in the configuration file, but this option is handy in my opinion.

Path exclusion and inclusion - filters

When you specify a source path, you might not want to backup everything inside. Combining exclude and include filters gives you granular control to decide which files and directories under the source path should be included in the backup. You can list as many filter items as you want, they will be processed and passed to Duplicity in the order they appear in the configuration file. When Duplicity is called by Dup-composer, these filter paths are passed using the --exclude and --include command line options. Please consult the Duplicity man pages on how these options govern the path selection process within Duplicity. In the meantime, let me present a simple example of how this works below.

Adding the following configuration to the backup source /var/www/html will exclude all of /var/www/html/no_bak, except /var/www/html/no_bak/important, which will be included in the backup:

sources:
      /var/www/html:
        backup_path: /home/backups/web_server_docroot
        restore_path: /var/www/html
        filters:
          - type: exclude
            path: /var/www/html/no_bak
          - type: include
            path: /var/www/html/no_bak/important

If two or more filters match the same path, the first match takes precedence.

Notable bugfixes

Trying Dup-composer on my server backups unearthed some bugs. The good news was, that two from these three bugs were supposed to be caught by my automatic unit and functional tests, but since I got the base premises wrong in the first place, the tests were incorrect themselves. This is a good example that testing is no silver bullet and some might argue, that the time cost associated with automated testing might be too high when such bugs get through the filter anyways. But I think the opposite: these were nasty bugs, true, but on the other hand I was absolutely testing for these and now that the tests are corrected, a few minute investment, I can be certain, that the same issue won't creep back after I make code changes down the line.

I am so more confidently making changes to existing code, when there is proper test coverage. Normally you would go through various code paths in various files wondering if you forgot something that is affected by the change, but with proper coverage I just make the change both to the tests, if needed, and the application code and let any failing tests drive my attention. I just fix if anything comes up and that's it.

These were the bugs for your reference:

  • Encryption command line options are concatenated with their values.
  • AWS credential environment variables are incorrect.
  • Special build number portion in the distribution maintained duplicity version numbers wasn't accounted for.

For the last item, there was no test in place, it is more of an integration issue really. It could have been filtered out with integration tests, but that would probably be too much work for this little project for now. Maybe in the future!

Containerization

Most of the services on my home server and also the lab dev environments are containerized, many of these are using Docker volumes, so it was inevitable, that I had to build a simple container image, with Duplicity and Dup-composer installed. This is a rudimentary image, I bind mount the configuration and my GPG keyring into it and mount the volumes that I want to back up, finally I docker run Dup-composer.

The run looks like this at the moment:

docker container run -v /root/.gnupg:/root/.gnupg -v oc_files:/nas/oc_files -v /root/dupcomp-config/dupcomposer-config.yml:/root/config.yml -d duptest:0.9 dupcomp -c /root/config.yml backup nas_oc_S3

I will look into specializing and polishing this image to have a practical backup container image, that can be useful to anybody that sets out to use Dup-composer.

Nevertheless, even though this might just remain a simple image with Dup-composer deployed inside, I will still publish it to save those who want to backup container volumes some time, others may use this as a base image that can be tailored to their needs.

Next steps

As I have most of the features I wanted for my backups implemented, I won't develop new ones for now. There are three areas where I want to wrap things up, before giving a shout out to the Duplicity community:

  • Have clear release and version handling strategy implemented in the workflow.
  • Release the latest version to PyPI.
  • Release Docker image.
  • Fix any bugs that come up in the meantime.