One thing I have learnt is configuration management. It’s an under-rated aspect of reproducible research. I was quickly experimenting with an R package, but needed to install the sf package on my tinkering machine. That in turn required another R package units. And in turn, units needed some header files, as did sf. As it turns out, units package failed with a very nice error message telling me exactly what I had to do. And I had very had deja vu dealing with sf because I know I’ve handed that before.

One moral of the story is to remember not to try to do things quickly like this. The interesting thing is that this used to be the only way I worked. Cue lots of frustration when one package got updated and broke some other code.

It turns out that there is now a CRAN task view dedicated to repdroducible research which has a section on package reproducibility. This includes links to packages such as R bundler which attempts to tame your package requirements on a project by project basis.

However, since having machines with enough RAM, I’ve found it very nice to use Virtual Machines during development. I used to use VirtualBox a long time ago (it let me run Linux on a MacBook). But you can use it with tools such as Vagrant to provision a virtual machine. And vagrant can in turn call an ansible script to provision this virtual machine. Voila, tinker away, break everything, start again.

