We are currently in the process of migrating our SVN repo to GIT (hosted at bitbucket). I used subgit to import all our branches/history into a bare repo i have locally on my (Windows) PC.
The repo is quite big (7.42 GB after the import) this is because it also contains information about SVN like revision numbers to provide a way to have a two way sync between Git and SVN (I'm only interested in a one way SVN to GIT).
I create a local clone of the imported bare repo and push all the branches to bitbucket. After a couple of hours (!) the repo was fully uploaded. BitBucket now gave me warnings about the repo size. I checked the size and it was 1.1GB. Thats not as big as the imported bare but still to big to have a fast repository.
After playing around with BFG i managed to remove soms large DLL/SQL export files using these commands on the bare repo (I only use the clone for pushing without all the svn-related refs):
java -jar bfg.jar --delete-files '{''specialized 2015''','''specialized,''insert-pcreeks''}.sql' --no-blob-protection
java -jar bfg.jar --delete-files 'Incara.*.dll' --no-blob-protection Incara.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
This took a while and afterwards the git_find_big.sh script did not show these large sql files anymore. But after pushing things back to bitbucket (as a new repo, not as a force push) it only got bigger (1.8GB)
Can you provide a possible explanation for this behavior?
I don't know if it matters but we used a non standard branch/tag model in svn. This resulted in branches like:
/refs/heads/archive/some/path/to/branch
. These branches seemed to work just fine and removing them also did not affect the size.
Next to these problems i noticed i had some XML files showing up in the git_find_big.sh
output:
size,pack,SHA,location 12180,1011,56731c772febd7db11de5a66674fe6a1a9ec00a7 repository/frontend.xml 12074,1002,0cefaee608c06621adfa4a9120ed7ef651076c33 repository/frontend.xml 12073,1002,a1c36cf49ec736a7fc069dcc834b784ada4b6a06 repository/frontend.xml 12073,1002,1ba5bd92817347739d3fba375fc42641016a5c1d repository/frontend.xml 12073,1002,e9182762bfc5849bc6645fdd6358265c3930779f repository/frontend.xml 12073,1002,dff5733d67cb0306534ac41a4c55b3bbaa436a2e repository/frontend.xml 12072,1002,8ee628f645ce53d970c3cf9fdae8d2697224e64c repository/frontend.xml 12072,1002,1266dee72b33f7a05ca67488c485ea8afc323615 repository/frontend.xml
These files contain the frontend logic of the web platform we are using and are indeed quite big. But they should be treated as text right? Therefore I don't get why they show up as separate objects in the above output. Am i right this should not be happening?
The SVN import also resulted in some empty commits (for example when SVN creates or moves a branch it needs a new commit). I guess these can only be removed using filter-branch?
Sorry, I have a lot of questions! Could someone help me with this?
Thanks,
Piet