[lang]

Present Perfect

Personal
Projects
Packages
Patches
Presents
Linux

Picture Gallery
Present Perfect

syncing large trees

Filed under: Hacking — Thomas @ 11:38

2009-04-02
11:38

Using rsync on really large trees (> 1 million files) is really slow on the preparation stage. rsync creates a complete list of files on both sides. Especially if only a few files have changed between runs, the length of the preparation stage is much higher than the actual length of the synchronisation phase.

I thought that rsync had some options to take the mtime on directories into account, to not descend into directories that didn't have an mtime change, which would vastly reduce the amount of files it would have to look at. Apparently I made this option up though.

Or did I ? Is there any software that does what I describe ? Is there something wrong with the basic idea I'm thinking of ? Feel free to put me in my place!

6 Comments »

  1. rsync 3 does an incremental scan, so transfers can start before the “prep” stage completes. The mtime of a directory is as it’s only modified if a file it contains was added/deleted/renamed, not if it was grown or shrunk, and wouldn’t reflect any changes that occured in subdirectories.

    The best way to speed up the rsync is to chop up the original rsync into smaller parallel rsyncs that work on a piece of the job, I’m not aware of a script that does that.

    Comment by kenneth — 2009-04-02 @ 12:14

  2. Stop killing the trees! It’s bad for the environment.

    Comment by Conficker — 2009-04-02 @ 12:16

  3. If you are doing the sync on two folders on the same file system, I have a perl script I could send you that does not need to do any pre-calcs. But if you need to sync over ftp or http, my script won’t do that. Email me if you would like the script.

    Comment by Kevin DeKorte — 2009-04-02 @ 15:39

  4. +1 for rsync 3.x, the incremental scan really helped me when I was rsyncing 4 million+ small (4k-256k) files on a regular basis.

    Comment by Paul — 2009-04-02 @ 17:48

  5. Hi,

    Maybe a bit late but I have think of using inotify-tools for that ?
    If only a few files are changed or created maybe you could just code some script with inotify
    that would send file back and forth (could even rsync the concern sub-dir) instead of syncing the whole tree.
    That assuming the whole tree don’t change to radically too often I guess.

    http://inotify-tools.sourceforge.net/

    Comment by ldng — 2009-04-11 @ 12:42

  6. The mtime of a directory does not change unless files are deleted or newly created. Modifying a file does not change the mtime of the directory. Also if you have a tree like a/b/c/d an unchanged mtime on b doesn’t say anything about changes in c or d.

    Comment by Maex — 2009-04-21 @ 18:00

RSS feed for comments on this post. TrackBack URL

Leave a comment

picture