3

我计划以非正统的方式使用 LibGit2/LibGit2Sharp 和 GIT,我要求任何熟悉 API 的人确认我的建议在理论上可行。:)

设想

只有主分支将存在于存储库中。将跟踪和提交大量包含大型二进制和非二进制文件的目录。大多数二进制文件将在提交之间更改。由于磁盘空间限制,存储库应包含不超过 10 个提交(磁盘现在经常被填满)。

API 没有提供一个函数,该函数将从指定的 CommitId 开始截断提交历史记录到主分支的初始提交,并删除任何因此而悬空的 GIT 对象。

我已经使用 ReferenceCollection.RewiteHistory 方法进行了测试,我可以使用它从提交中删除父母。这为我创建了一个新的提交历史,从 CommitId 开始回到 HEAD。但这仍然会留下所有旧提交以及这些提交所独有的任何引用或 blob。我现在的计划是自己简单地清理这些悬空的 GIT 对象。有没有人看到这种方法有任何问题或有更好的方法?

4

2 回答 2

3

但这仍然会留下所有旧提交以及这些提交所独有的任何引用或 blob。我现在的计划是自己简单地清理这些悬空的 GIT 对象。

在重写存储库的历史时,LibGit2Sharp 会注意不丢弃重写的引用。默认情况下,存储它们的命名空间是refs/original. 这可以通过RewriteHistoryOptions 参数来改变。

为了删除旧的提交、树和 blob,首先必须删除这些引用。这可以通过以下代码实现:

foreach (var reference in repo.Refs.FromGlob("refs/original/*"))
{
    repo.Refs.Remove(reference);
}

下一步将清除现在悬空的 git 对象。但是,这还不能通过 LibGit2Sharp 完成。一种选择是使用 git 执行以下命令

git gc --aggressive

这将以非常有效/破坏性/不可恢复的方式减少存储库的大小。

有没有人看到这种方法有任何问题或有更好的方法?

您的方法看起来有效。

更新

有没有人看到这种方法有任何问题或有更好的方法?

如果限制是磁盘大小,另一种选择是使用git-annexgit-bin之类的工具将大型二进制文件存储在 git 存储库之外。请参阅此SO question以获取有关该主题和潜在缺点(部署、锁定、...)的一些不同观点。

我将尝试您提供的 RewriteHistoryOptions 和 foreach 代码。但是,现在它看起来像 File.Delete 对我来说悬空 git 对象。

当心,这可能是一条崎岖不平的道路

  • Git 以两种格式存储对象。松散(每个对象在磁盘上一个文件)或打包(磁盘上的一个条目包含许多对象)。从包文件中删除对象往往有点复杂,因为它需要重写包文件。
  • 在 Windows 上,.git\objects文件夹中的条目通常是只读文件。File.Delete在这种状态下无法删除它们。例如,您必须首先通过调用取消设置只读属性File.SetAttributes(path, FileAttributes.Normal);
  • 尽管您可能能够识别出哪些提交已被重写,但确定哪些是悬空/不可到达Tree的 s 和Blobs 可能会变成一项相当复杂的任务。
于 2013-12-11T22:18:58.053 回答
0

根据上面的建议,这里是我提出的初步(仍在测试)C# 代码,它将在特定 SHA 处截断主分支,创建新的初始提交。它还删除了所有悬空引用和 Blob

        public class RepositoryUtility
{
    public RepositoryUtility()
    {
    }
    public String[] GetPaths(Commit commit)
    {
        List<String> paths = new List<string>();
        RecursivelyGetPaths(paths, commit.Tree);
        return paths.ToArray();
    }
    private void RecursivelyGetPaths(List<String> paths, Tree tree)
    {
        foreach (TreeEntry te in tree)
        {
            paths.Add(te.Path);
            if (te.TargetType == TreeEntryTargetType.Tree)
            {
                RecursivelyGetPaths(paths, te.Target as Tree);
            }
        }
    }
    public void TruncateCommits(String repositoryPath, Int32 maximumCommitCount)
    {
        IRepository repository = new Repository(repositoryPath);
        Int32 count = 0;
        string newInitialCommitSHA = null;
        foreach (Commit masterCommit in repository.Head.Commits)
        {
            count++;
            if (count == maximumCommitCount)
            {
                newInitialCommitSHA = masterCommit.Sha;
            }
        }
        //there must be parent commits to the commit we want to set as the new initial commit
        if (count > maximumCommitCount)
        {
            TruncateCommits(repository, repositoryPath, newInitialCommitSHA);
        }
    }
    private void RecursivelyCheckTreeItems(Tree tree,Dictionary<String, TreeEntry> treeItems, Dictionary<String, GitObject> gitObjectDeleteList)
    {
        foreach (TreeEntry treeEntry in tree)
        {
            //if the blob does not exist in a commit before the truncation commit then add it to the deletion list
            if (!treeItems.ContainsKey(treeEntry.Target.Sha))
            {
                if (!gitObjectDeleteList.ContainsKey(treeEntry.Target.Sha))
                {
                    gitObjectDeleteList.Add(treeEntry.Target.Sha, treeEntry.Target);
                }
            }
            if (treeEntry.TargetType == TreeEntryTargetType.Tree)
            {
                RecursivelyCheckTreeItems(treeEntry.Target as Tree, treeItems, gitObjectDeleteList);
            }
        }
    }
    private void RecursivelyAddTreeItems(Dictionary<String, TreeEntry> treeItems, Tree tree)
    {
        foreach (TreeEntry treeEntry in tree)
        {
            //check for existance because if a file is renamed it can exist under a tree multiple times with the same SHA
            if (!treeItems.ContainsKey(treeEntry.Target.Sha))
            {
                treeItems.Add(treeEntry.Target.Sha, treeEntry);
            }
            if (treeEntry.TargetType == TreeEntryTargetType.Tree)
            {
                RecursivelyAddTreeItems(treeItems, treeEntry.Target as Tree);
            }
        }
    }
    private void TruncateCommits(IRepository repository, String repositoryPath, string newInitialCommitSHA)
    {
        //get a repository object
        Dictionary<String, TreeEntry> treeItems = new Dictionary<string, TreeEntry>();
        Commit selectedCommit = null;
        Dictionary<String, GitObject> gitObjectDeleteList = new Dictionary<String, GitObject>();
        //loop thru the commits starting at the head moving towards the initial commit  
        foreach (Commit masterCommit in repository.Head.Commits)
        {
            //if non null then we have already found the commit where we want the truncation to occur
            if (selectedCommit != null)
            {
                //since this is a commit after the truncation point add it to our deletion list
                gitObjectDeleteList.Add(masterCommit.Sha, masterCommit);
                //check the blobs of this commit to see if they should be deleted
                RecursivelyCheckTreeItems(masterCommit.Tree, treeItems, gitObjectDeleteList);
            }
            else
            {
                //have we found the commit that we want to be the initial commit
                if (String.Equals(masterCommit.Sha, newInitialCommitSHA, StringComparison.CurrentCultureIgnoreCase))
                {
                    selectedCommit = masterCommit;
                }
                //this commit is before the new initial commit so record the tree entries that need to be kept.
                RecursivelyAddTreeItems(treeItems, masterCommit.Tree);                    
            }
        }

        //this function simply clears out the parents of the new initial commit
        Func<Commit, IEnumerable<Commit>> rewriter = (c) => { return new Commit[0]; };
        //perform the rewrite
        repository.Refs.RewriteHistory(new RewriteHistoryOptions() { CommitParentsRewriter = rewriter }, selectedCommit);

        //clean up references now in origional and remove the commits that they point to
        foreach (var reference in repository.Refs.FromGlob("refs/original/*"))
        {
            repository.Refs.Remove(reference);
            //skip branch reference on file deletion
            if (reference.CanonicalName.IndexOf("master", 0, StringComparison.CurrentCultureIgnoreCase) == -1)
            {
                //delete the Blob from the file system
                DeleteGitBlob(repositoryPath, reference.TargetIdentifier);
            }
        }
        //now remove any tags that reference commits that are going to be deleted in the next step
        foreach (var reference in repository.Refs.FromGlob("refs/tags/*"))
        {
            if (gitObjectDeleteList.ContainsKey(reference.TargetIdentifier))
            {
                repository.Refs.Remove(reference);
            }
        }
        //remove the commits from the GIT ObectDatabase
        foreach (KeyValuePair<String, GitObject> kvp in gitObjectDeleteList)
        {
            //delete the Blob from the file system
            DeleteGitBlob(repositoryPath, kvp.Value.Sha);
        }
    }

    private void DeleteGitBlob(String repositoryPath, String blobSHA)
    {
        String shaDirName = System.IO.Path.Combine(System.IO.Path.Combine(repositoryPath, ".git\\objects"), blobSHA.Substring(0, 2));
        String shaFileName = System.IO.Path.Combine(shaDirName, blobSHA.Substring(2));
        //if the directory exists
        if (System.IO.Directory.Exists(shaDirName))
        {
            //get the files in the directory
            String[] directoryFiles = System.IO.Directory.GetFiles(shaDirName);
            foreach (String directoryFile in directoryFiles)
            {
                //if we found the file to delete
                if (String.Equals(shaFileName, directoryFile, StringComparison.CurrentCultureIgnoreCase))
                {
                    //if readonly set the file to RW
                    FileInfo fi = new FileInfo(shaFileName);
                    if (fi.IsReadOnly)
                    {
                        fi.IsReadOnly = false;
                    }
                    //delete the file
                    File.Delete(shaFileName);
                    //eliminate the directory if only one file existed 
                    if (directoryFiles.Length == 1)
                    {
                        System.IO.Directory.Delete(shaDirName);
                    }
                }
            }
        }
    }
}

感谢您所有的帮助。衷心感谢。请注意,我从原始代码中编辑了此代码,因为它没有考虑目录。

于 2013-12-12T20:21:08.957 回答