c++ - 迭代位的最快方法

Question

我已经在国际象棋引擎上工作了一段时间，它相当强大（大约 2700 CCRL）并且想知道以下事情：

大多数顶级国际象棋引擎都使用位板。它们基本上只是 64 位数字，代表一些二进制数据，例如一个正方形是否被占用。有许多应用程序可以使用位板，但在许多情况下，我需要遍历所有位并使用已设置位的索引。

我事先定义的一些功能：

typedef uint64_t U64;
typedef int8_t Square;  //its signed for different reasons

/**
 * returns the index of the LSB
 * @param bb
 * @return
 */
inline Square bitscanForward(U64 bb) {
//    assert(bb != 0);
    return __builtin_ctzll(bb);
}


/**
 * resets the lsb in the given number and returns the result.
 * @param number
 * @return
 */
inline U64 lsbReset(U64 number) {
    return number & (number - 1);;
}

使用这两个，我通常像这样迭代位板：

U64 bb = ....

while(bb){
   Square s = bitscanForward(bb);
   ...
   bb = lsbReset(bb);
}

另一种解决方案是：

for(;bb!=0;bb=lsbReset(bb)){
   Square s = bitscanForward(bb);
   ...
}

然而第二个结果却更慢（出于某种原因）。

我的问题是：

为什么第二种方法较慢？
有更快的方法吗？特别是在通过调整一些计算来减少时钟周期方面。

编辑 1

如所愿，我发布了我的测试代码。事实上，我发现了一个小错误，它们都以相同的速度运行。然而，如果有更好的实现，问题仍然存在。

  U64 k = randU64();
    printBitmap(k);
    
    startMeasure();
    int sum = 0;
    for(int i = 0; i < 1e8; i++){
        U64 copy = k;
        while(copy){
            Square s = bitscanForward(copy);
            sum += s;
            
            copy = lsbReset(copy);
        }
    }
    std::cout << stopMeasure() << "ms sum="<<sum << std::endl;
    
    startMeasure();
     sum = 0;
    for(int i = 0; i < 1e8; i++){
        U64 copy = k;
        for(;copy!=0;copy=lsbReset(copy)){
            Square s = bitscanForward(copy);
            sum += s;
    
        }
    }
    std::cout << stopMeasure() << "ms sum="<<sum << std::endl;

哪个输出：

10101001
01011111
00011111
00000111
01010001
00000011
00110001
10010100

1748ms sum=-1174182400
1757ms sum=-1174182400

c++ - 迭代位的最快方法

编辑 1

0 回答 0

Related

Reference