I’m writing some code that does something that seems straightforward but I’m trying to cover all my bases.
The code in question should perform a bunch of file/directory renames, and update some files that contain references to those changed filenames, such that the complete set stays consistent. (As a purely theoretical example, think ‘a directory of sound files, and the .m3u/.cue/.log file in it).
I used to have something like this in DAD, so that you could fix spelling of tracks after ripping. Except that it was an ugly perl script and just gave up if anything went wrong in the middle (permissions, bad naming, machine crashing, …)
This time around I want to take this conceptually simple but by design unatomic operation, and turn it into a set of resumable operations, such that if for whatever reason one of the atomic operations fails, or the machine crashes in the middle, there is enough information to resume the complete operation.
My current approach goes something like this:
– each Operation (renaming a file, replacing references in a file, …) is an object
– the object has serialize/deserialize methods
– an Operator object gets loaded with a list of these Operations
– each Operator mains state of all tasks to be done, and which tasks are done already
– the Operator starts by serializing the todo list to disk in a given state directory
– it goes through each operation, performing the operation (which should end in an atomic step, for example moving the temporary file over the original file in the case of renaming in a file)
– after performing the operation, the operator serializes done to the state directory.
– the Operator continues performing tasks and saving done to the state directory.
– at the end of all operations, the Operator first removes the todo state, then the done state.
If, at any point, an operation errors out, then the problem can be resolved manually, and the Operator can be resumed (since it knows what the last operation was that it performed successfully).
If, at any point, the application or machine crashes, the todo and done state can be found in the state directory, so the Operator can be loaded from disk. The Operator knows which tasks have definately been performed, knows what the next task is. The only unknown is if the next task actually got performed successfully or not. Since each operation has to end in a final atomic call, it suffices to check if that final call was made correctly, and update the done state. From there it can continue.
This is the basic design I’m planning on doing. But I’m pretty tired today, lacking some sleep, so please punch holes in this approach, or tell me how overengineered this whole approach is and how it could be much better. Basically, the only even remotely similar thing I could find was this Perl module. I probably got the idea for using a state file from discussions with Jan and Arek at work about their RRD synchronization framework and its journal files, but I probably butchered their approach for this particular problem.
I could use a good reality check!