Honing the craft.

You are here: You're reading a post

dup-composer Duplicity frontend development status

My mini project called dup-composer that is aimed to develop a simple configuration based CLI frontend for Duplicity is in progress. I have kicked off actual development last week and want to share with you a progress update. TL;DR I will publish the initial prototype or alpha version in the next 1-2 days. You can find details about my progress, findings and decisions in this post.

Progress and next steps

The minimum version of the application backbone is completed and I will push the baseline version to the GitHub repository in the next 1-2 days. (I have decided to not do individual commits for each development step until now, as the project is very small and since the application is not yet functional, yield from tracking the development steps in this early stage is marginal.) For now, the application only does "dry runs" showing the output of the Duplicity commands it would run, as I want to put some functional tests in place before taking this to the next level by actually kicking off backup jobs.

The high level flow of the functionality looks like this at the moment:

  1. The configuration file dupcomposer-config.yml is parsed into a Python dictionary using PyYAML.
  2. The configuration data is partitioned into the groups defined in the config file.
  3. For each group, the various configuration data sections (encryption, storage provider, etc.) are dispatched into corresponding objects doing the processing.
  4. The resulting Duplicity command line options are collected from these processing objects and are assembled together to generate the commands corresponding to the configuration.

Testing is important.

I don't think that it is feasible to write unit tests for one-off small projects, as usually the positive payoff of extensive unit testing correlates with the project size and how evolutionary the code base is; by "evolutionary" I mean making many changes to the code base for a long time in the future. If the project is relatively large and constant changes are expected for years, having good test coverage is fundamental.

By these measures, I probably shouldn't bother to write tests for a small pet project like this, but I still decided to follow the TDD workflow and aim for 100% test coverage of this project for the following two reasons:

  • Running backups is a mission critical activity, which is usually run as a super user. This also means, that if something is not done right, damages could be substantial.
  • My hope is that there will be community interest in such an application, in which case I would continue to add more features I might not use myself and improve the application on the whole. There might be many changes made down the road as a result, in which case, having test coverage will be great.

My plan is to add functional tests to my suite as well. This is on my next few days' TODO list.

Interesting findings and lessons learned so far

  • Although I wasn't struck by this, I think it might catch beginners by surprise: It has been explained in many places, that you can't rely on the dictionary keys to be in any particular order, as in most Python versions, this is still either non-deterministic or depends on an implementation detail that can't be relied upon (this is different starting with Python 3.7, see below). Of course since YAML named attributes, that YAML calls tags are translated into dictionary keys, you can expect the same from your data loaded from a YAML file with PyYAML. The tags might be in a particular order in the file, but as soon as they are converted into a Python dictionary, you can't expect those keys to be ordered the same way, you need to do the ordering "manually" if required. One interesting change with the latest version of Python (3.7) is that this is no longer true: Apparently, starting with Python 3.6, the ordering of the dictionary keys has become insertion order based, but with 3.6, this is only an implementation detail of CPython, so your mileage may vary on other runtime implementations. In 3.7 however, this behavior is now the part of the language specification, which means, that you can count on this order, if you need to. This Stack Overflow answer summarizes this beautifully.
  • The YAML 1.1 specification requires parser implementations to do some magic with certain values in the YAML data. Strings, like yes and no have to be converted into booleans for instance. This took me by surprise and at first I didn't understand why my tests were failing, but I figured out eventually. :) With YAML specification 1.2 the prescribed behavior became a lot more conservative about transforming values automatically. This is a lot less error prone in my opinion, especially for new users. The PyYAML library however only supports the YAML 1.1 specification currently, but there are alternative parsers for Python like ruamel.yaml that do.
  • As the latest Duplicity version 0.8 introduced Python 3 support - with the caveat, that a few backends don't support it yet - I have also decided to develop dup-composer on Python 3. My only worry is, that 0.8 is not yet released by the Debian and Ubuntu package maintainers and I don't want to limit the number of potential users if there is interest. Hence, depending on when the new version will be released into distributions, I might need to make this work on Python 2 as well.

Summary; and why am I doing this at all?

You can expect a minimal version of dup-composer in the next few days. I will release another blog post then with the GitHub repository details. This version will be able to generate full Duplicity commands from the configuration file for review. After implementing some functional tests that all pass, the tool will also execute the backups if desired.

It might interest you why I am spending time to create projects like this on my own time. So let me share with you a short priority list of what I am aiming for:

  1. My top priority is to learn from this process. This includes both the technical part and the experience from sharing my work with an audience. Getting feedback that I can use to grow is also part of this story.
  2. Creating something useful that I will use myself and that adds value to my workflow is also very important.
  3. Sharing these tools within the open source ecosystem. This community creates software enabling me in so many different ways. Thank you!
  4. Offer a direct peek into how I work and communicate for future clients. They will know me, my interests and competencies - in short, they will know what to expect when collaborating with me.