I'm currently trying to write a script that does some post processing after rsync --max-size=4000000000
has done its job, to allow for full backups to FAT32 (which is the only filesystem that is r/w on all of Windows/Mac/*nix)
I am writing in bash for Mac OS X and Linux; currently testing on OS X. Code is here
https://github.com/taikedz/fullsync32/blob/master/fullsync32
The script recurses through directories finding
- files that have a resource fork (HFS property)
- files that are larger than 4 GB
and upon finding such files either processes them via tar -cz
or split
as appropriate, before copying them over.
I use recursion instead of the find
utility because of the test for presence of resource fork on a file: it involves checking the size of a special file. Say you have file foo.txt; its resource fork can be found by looking at ls -l foo.txt/..namedfork/rsrc
and cheking the length is non-zero.
The basic structure is
recurse() { pushd "$1" for NODE in *; do if [ -d "$NODE" ]; then recurse "$NODE" continue fi # (process files here, with calls to split, tar and md5) done popd } recurse ./target/directory
Problem
I ran this against my backups the other day and left it running for a few hours. When I came back I found that my spare 11 GB of RAM had been used up, and it was ~248 MB into swap...
I looked around on Google for issues around bash memory leakage in recursion, and apart from a few tenuously answered forum posts, didn't find much...
The other add result of which (which is Mac specific) is that the "Inactive memory" stays inactive and the system runs slowly... Restart required.
Questions
- Is such potentially deep recursion with bash a bad idea in itself?
- is there an ingenious way to iterate rather than recurse in this situation?
- or am I going about this completely wrong anyways?
You inputs are much appreciated!