git - 有效检索包含提交的版本

Question

在命令行中，如果我键入

git tag --contains {commit}

要获取包含给定提交的版本列表，每次提交大约需要 11 到 20 秒。由于目标代码库存在超过 300,000 次提交，因此为所有提交检索此信息需要很多时间。

但是，gitk显然设法很好地检索了这些数据。根据我的搜索，它为此使用了缓存。

我有两个问题：

如何解释该缓存格式？
有没有办法git从命令行工具获取转储以生成相同的信息？

score 5 · Accepted Answer

您几乎可以直接从git rev-list.

latest.awk：

BEGIN { thiscommit=""; }
$1 == "commit" {
    if ( thiscommit != "" )
        print thiscommit, tags[thiscommit]
    thiscommit=$2
    line[$2]=NR
    latest = 0;
    for ( i = 3 ; i <= NF ; ++i ) if ( line[$i] > latest ) {
        latest = line[$i];
        tags[$2] = tags[$i];
    }
    next;
}
$1 != "commit"  { tags[thiscommit] = $0; }
END { if ( thiscommit != "" ) print thiscommit, tags[thiscommit]; }

示例命令：

git rev-list --date-order --children --format=%d --all | awk -f latest.awk

您也可以使用--topo-order，并且您可能必须在$1!="commit"逻辑中清除不需要的引用。

根据您想要的传递性以及列表的明确程度，累积标签可能需要字典。这是一个获得所有提交的所有参考的明确列表：

all.awk：

BEGIN {
    thiscommit="";
}
$1 == "commit" {
    if ( thiscommit != "" )
        print thiscommit, tags[thiscommit]
    thiscommit=$2
    line[$2]=NR
    split("",seen);
    for ( i = 3 ; i <= NF ; ++i ) {
        nnew=split(tags[$i],new);
        for ( n = 1 ; n <= nnew ; ++n ) {
            if ( !seen[new[n]] ) {
                tags[$2]= tags[$2]" "new[n]
                seen[new[n]] = 1
            }
        }
    }
    next;
}
$1 != "commit"  {
    nnew=split($0,new,", ");
    new[1]=substr(new[1],3);
    new[nnew]=substr(new[nnew],1,length(new[nnew])-1);
    for ( n = 1; n <= nnew ; ++n )
        tags[thiscommit] = tags[thiscommit]" "new[n]

}
END { if ( thiscommit != "" ) print thiscommit, tags[thiscommit]; }

all.awk花了几分钟来完成 322K linux 内核 repo 提交，大约每秒一千次或类似的东西（大量重复的字符串和冗余处理）所以如果你真的想要完整的话，你可能想用 C++ 重写它交叉产品......但我不认为 gitk 表明，只有最近的邻居，对吧？

git - 有效检索包含提交的版本

1 回答 1

Related

Reference