backups are good.
Over the years I've come to realize this simple mantra, to the point of thinking, every time I put in place a new service - "how do we back this up ?"
Usually the answer is relatively simple. You svndump an svn repository then tar/bzip2 it. You dump a trac setup and tar/bzip2 it along with the config files. You rsync over maildirs (not entirely correct but good enough). You add this as a daily cron job. And so on.
This isn't perfect, though. Sooner or later you will have to deal with the swaths of disk space your backups are wasting. And really, do you still need the svn dump from two years ago on Monday if you also have Sunday and Tuesday ?
So really, what you want is something more like "only keep the daily backups for three months, and after that only keep a weekly backup, or one every x days".
I searched for ages among various find-like tools to make this possible from a shell script, and never found anything useful. Two weeks ago I decided I'd just write it in python, and it turns out it's a lot simpler than I was fearing it would be:
#!/usr/bin/python
import glob
import stat
import os
import time
files = glob.glob('*')
for file in files:
keep = False
s = os.stat(file)
mtime = s[8]
# keep if it is from a sunday
anyGivenSunday = time.mktime((2007, 5, 6, 0, 0, 0, 0, 0, 0))
secondsPerDay = 24 * 60 * 60
if (mtime - anyGivenSunday) % (7 * secondsPerDay) < secondsPerDay:
keep = True
# keep if it is younger than 3 months
now = time.time()
if now - mtime < 90 * secondsPerDay:
keep = True
print file, mtime, keep
This script prints out one line per file from the current directory, with True in it if the file should be kept.
So typically I run this as
keep-sundays | grep False | cut -d' ' -f 1
to see the list, and then add "xargs rm" if the list makes sense.
Next step would probably be to refine this a little, add some arguments, and put in a cron job, but for now it solves the problem of weeding out my backups and free some disk space on our servers after 3 years of backups.
backups are good. Over the years I've come to realize this simple mantra, to the point of thinking, every time I put in place a new service - "how do...