3

It is possible to filter the SVN dump, generated by svndamin dump, so it will not included encoded binary data, just the text deltas and data?

I want to have a dump of an existing large SVN repositories, but only of the code. I have no interest in the stored binaries. However, binary files will make the dump file unnecessarily large. How can I generate the dump and exclude binary content?

Tried and failed, already:

  1. It is not practical to process the svn log diffs. It is a large and old repository, and getting diffs only for a short time period takes a lot of time and often gets stuck.
  2. The binary files are scattered all over, and not stored under a single known path, so I cannot use svndumpfilter to exclude them - Unless there is some way to use this filter with regular expressions, e.g. *.jar.
4

2 回答 2

3

svndumpfilter is part of any Subversion installation

svndumpfilter exclude — Filter out nodes with given prefixes from the dump stream.

Beginning in Subversion 1.7, svndumpfilter can optionally treat the PATH_PREFIXs not merely as explicit substrings, but as file patterns instead.

$ svndumpfilter exclude --pattern "*.OLD" < dumpfile > filtered-dumpfile
Excluding prefix patterns:
   '/*.OLD'
于 2012-12-21T16:56:53.057 回答
1

I don't know of a stock tool to do this. But it shouldn't be hard to do if you start with this perl module: SVN::Dumpfilter

One of the example scripts in there (svndump_delpathfilter) is probably pretty close to what you want. My experience with this module is that you'll probably have to tinker with it a bit to get it to do what you want.

Now, I don't think there is any way to reliably tell a binary from a text file, since Subversion (at the lowest levels) doesn't really care. A quick scan of my repository shows that the svn:mime-type property isn't always set, and I see no other indicative fields. So you'll have to check via name or (somehow) try looking at the contents of the file (but I have never done the latter).

于 2012-12-21T06:17:54.217 回答