From Pexels Kevin Ku
I’ve been meaning to do this for a long time. I got my fingers burnt trying to update my database to load the latest release of stats19 data and decided that was the hint I needed. The first thing to note is that I installed the debian package using the instructions here
So the first thing to do is enter the data storage folder and run:
git init
dvc init
I’m holding my “store” in a USB folder, so I ran
dvc remote add -d usb_remote /mnt/usb1/dvcstore
to get that set up.
Then, it’s just a case of doing
dvc add </path/to/file.csv>
git add </path/to/file.csv>.dvc .gitignore
(I don’t think you need .gitignore if you are updating a file). Although, I used this opportunity to add some metadata to the .dvc files. The git then needs some git commit and git push (assuming we’ve set up a remote git repo as well).
I added some meta-data to the .dvc file to help me track a few details.
meta:
source_url:
https://www.gov.uk/government/statistical-data-sets/road-safety-open-data
download_date: '2025-10-14'
publisher: DfT
license: OGL v3
timeframe: 1979 to 2022
format: csv
row_count: 11845978
column_count: 21
Key commands here are:
wc -l file.csv
head -n 1 file.csv | awk -F, '{print NF}'
Then we just need to run
git tag <data/years>
git commit
git push
dvc push
So the cunning plan here is that I can run
git checkout <data/years>
dvc pull
And I should have the relevant data in my working area.
Finally, now that I’m using this I can make the download part of the dvc process
Because I’d been manually curating in the past, I had to force the system into an update
dvc import_url --force https://stats19.gov.uk/location
But at this point I can update my metadata, and git add / git commit and even git tag
Then next year, I only need to run
dvc update collisions_latest.csv.dvc
Use the share button below if you liked it.
It makes me smile, when I see it.