22. September 2019
Dup-composer development update
After one week of hiatus from the development of Dup-composer, I am back again with a status update. In this post, I am sharing the main new features and changes delivered last week and some of the issues I have run into with you. I will also touch on the challenges I want to solve in the next stage.
Planned enhancements delivered.
I have set three principal deliverables last time.
Specifying, implementing and testing the reliable use of a concrete set of characters in path names was the first one. While mapping out the specification part, I found very permissive rules from the operating system's perspective, and these are also slightly different from OS to OS and file system to file system. Actually, they are so permissive, that there are some characters allowed, that are not useful in this context at all, like terminal control characters. To counterbalance that, it makes sense to have more restrictive rules in application software.
I wanted to strike a good balance, luckily, the Python subprocess module can take care of all the shell quoting issues, so at least I don't have to worry about characters that carry a special meaning in a shell context. David A. Wheeler's excellent article on the subject was very helpful in distilling the final list. The following characters are not allowed in path names by Dup-composer:
- Leading hyphens, as they create an undesired confusion with options parsing.
- Terminal control characters, as they can cause display issues and they don't make sense in a file name anyways.
- Backslash characters to avoid any confusion with escaping sequences.
Using paths in the configuration with any of these characters causes an error. I have put basic tests cases in place to verify this functionality, these will be extended in the future to test the broad set of characters, that are allowed in path names as well.
The second deliverable included the implementation of the code calling the duplicity
command and the creation of the functional tests. The latter will ensure, that my wrapper calls duplicity
with the right arguments and passes the environment variables required. Making the call using the subprocess module is easy, however I had to put a bit more thought into the testing part. For functional testing, I have created a Python script as the mock implementation of Duplicity. This script saves the arguments and environment variables it is called with in a file in JSON format, the functional test can then read it and compare the data with the expected values.
The last feature was backing up and restoring specific groups and sources. This is working now, you can provide the group names you want to back up or restore after the action command, backup or restore, like this: dupcomp.py backup my_group1 my_group2 ...
. Running sources inside groups selectively hasn't been implemented yet, I will see if I need that after I start using Dup-composer in production. It stays on my laundry list for now.
Additional changes and things I have run into.
restore_path
was always handled as a mandatory configuration property, but this didn't make sense for backups. Hence this property is no longer mandatory in the configuration, when doing a backup.
The README of the GitHub repo didn't have much information about the current state of the project, so I have updated it with the latest information, including how the configuration can be done. My previous post on the configuration came handy, as I used large chunks of text from it to update the README.
I had some second thoughts about building this wrapper on top of Python 3.x, as although the latest stable release of Duplicity supports it, with the exception of a few backends, the current version available in the mainstream Linux distribution repos is still the earlier minor version running on top of Python 2.x. Since this was in the back of my head, I have preferred backward compatible coding alternatives in places, even though a more future proof option was available. In case of the subprocess module for instance, I have opted for the legacy API. After revisiting this subject, I don't really see a big problem running my wrapper on top of Python 3; even if Duplicity itself is still running on the earlier Python runtime. Most, if not all currently active releases of mainstream Linux distributions as well as Cygwin have Python 3 available in their repository, so even though it might not be installed by default, it shouldn't be a big hurdle for the user to install it. Hence I will stick to the newer version from now on. Some of the existing code will be refactored as needed to fit with this direction.
The challenges I want to solve next.
While Dup-composer is developed and tested, it is running in my Python virtualenv, but it will need a more production proof deployment process developed, that checks and sets up the required dependencies and makes installation easy and problem free. Currently, my plan is to build an installable package with setuptools and post it to PyPI.
When I started building the unit tests, the example configuration came handy while building out the first tests quickly. Later on however, as I started to cover more special cases, adding different variants of expected results data chunks became necessary to cover those cases, and things got a bit out of hand. I need to come up with a strategy and refactor the tests in a way, that puts these variants of the expected values on a diet and minimizes duplication.
Before trying the tool in production, I definitely want to remove clear text passwords and keys from the configuration file and implement some form of integration with the keyring Python module. I haven't looked into that deeply just yet, so I don't know how I will do it, but I'll look into that very soon.
Finally, one of the concerns I have is the user changing the configuration. What if a change breaks things for an existing backup chain? I don't think that it would be possible to make the backup chain corrupted, Duplicity should take care of that, but a restore might fail if the encryption key or other core information is changed after the backup is created. A simple warning to the user about the change and highlighting the change made should be sufficient.
That's all for now, if you are interested in seeing other open issues on this project, reviewing the documentation or source, please checkout the GitHub repo. I would love to hear any feedback, you can reach me via email or on Twitter as @heapsdontlie.