c - word_at_a_time.h 中的 has_zero 和 find_zero 是什么

Question

在linux内核中，inlucde/linux/word_at_a_time.h，有两个函数：

static inline long find_zero(unsigned long mask)
{
        long byte = 0;
#ifdef CONFIG_64BIT
        if (mask >> 32)
                mask >>= 32;
        else
                byte = 4;
#endif
        if (mask >> 16)
                mask >>= 16;    
        else
                byte += 2;
        return (mask >> 8) ? byte : byte + 1;
}

static inline bool has_zero(unsigned long val, 
                            unsigned long *data, 
                            const struct word_at_a_time *c)
{
        unsigned long rhs = val | c->low_bits;
        *data = rhs;
        return (val + c->high_bits) & ~rhs;
}

它用于哈希函数，在 git log 中它说：

 - has_zero(): take a word, and determine if it has a zero byte in it.
   It gets the word, the pointer to the constant pool, and a pointer to
   an intermediate "data" field it can set.

但我还是不明白

（1）这个功能做什么？
（2）他们是怎么做的？

score 5 · Accepted Answer

假设位从最低有效位 (LSB)（编号为 0）到最高有效位 (MSB) 编号。

它能做什么？

使用类似于分而治之的技术，从左侧开始函数find_zero()搜索第一个(<=7) 字节值为零的字节。 N

它是怎么做的？

假设定义了一个 64 位系统CONFIG_64BIT，然后执行以下代码：

#ifdef CONFIG_64BIT
        if (mask >> 32)  //Any bit=1 in left 32 bits 
                mask >>= 32;
        else
                byte = 4;  //<--No, fist 4 bytes are zero

第一个注释mask是unsigned，>>无符号右筛选也是如此。（就像 Java 的 >>>）

它首先检查 formask值是否大于 2 ³²（或者我们可以说编号为的unsigned long mask位之间的任何位是否为 1）。3263

mask >> 32将产生一个纯值，该值是其mask右移零0扩展32位，并导致32高位包含零。

例如，如果maks位如下：

63     56 55     48 47     40 39     32 31     24 23     16 15        7       0
▼       ▼ ▼       ▼ ▼       ▼ ▼       ▼ ▼       ▼ ▼       ▼ ▼         ▼       ▼
0000 1000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0001 0101
▲                                                                             ▲
MSB                                                                         LSM

在这个例子中，位数34和59是一，所以(mask >> 32)== true（或者说非零!0）。因此对于这个例子if (mask >> 32)== if(!0)。
通常，如果从位号32到的任何位63为 1，mask则将更新为mask >>= 32;== mask = mask >> 32;（如同 true 和 if 语句执行）。这导致由于右移，32高位变为低位（并且位 32 到 63 变为）。320

在上面的例子mask中变成：

           shifted OLD bit number ----> 63                 45                32
63                  47               32 31                 15                 0
▼                   ▼                 ▼ ▼                   ▼                 ▼
0000 0000 0000 0000 0000 0000 0000 0000 0000 1000 0000 0000 0000 0000 0000 0100
▲                                                                             ▲
MSB                                                                         LSM
|-------------------------------------|
| All are zero                        |

注意：从 32 到 63 的0位是从位0到从位到从上面31复制的位。3263

到这里：

情况 1：
如果从32to的任何位在原始中63为1mask ，则if执行 true 和 mask 更新。（正如我上面解释的那样）。变量long byte 仍然存在0。然后在 next 中if(mask >> 16)，函数find_zero()将继续在位范围内搜索一个值为 0 的字节48to 63(In ，如果任何位为 1，将检查if(mask >> 16)编号为48to的位)。63

情况2：
如果 original 中的所有位32都63为零mask，则if条件变为假（即if(mask >> 32)== if(0)）并且mask不更新，并且变量long byte变为4, and 这表明高四个字节0在中mask。在此之后if(mask >> 16)，函数find_zero() 将搜索位范围16为0 的字节31。

同样，在 Second if()：

if (mask >> 16)
  mask >>= 16;    
else
  byte += 2;

它将检查编号位之间的任何位是否16 to 31为 1。如果所有位都为零，则在 else 部分byte中将递增，2在 if 部分中，掩码将通过无符号右移 16 位进行更新。
（注意：如果mask被更新mask，那么实际上这会if(mask >> 16)检查之间的任何位是否48 to 63为一，否则位16到31原始掩码中）

随后，最后一个条件运算符以类似的方式工作，以检查编号为 1 的位之间的任何位是否8为151。

                  long byte = 0;
                  64 bit `mask`
                       |
                       |
                       ▼
            if(any bit from 32 to 62 is one)---+
            |                                  |
            |False: if(0)                      |True: if(!0)
      all bits between 32 to 64              |A byte=Zero NOT found
      are zero                         Some bits are one from bit 32-63
            |                                  |
            |                                  |
        byte = 4                          byte= 0, and
        64 bit `mask`  <--Orignal       `mask` updated as `mask >>= 32;`
           |                                        |
           |                                        |
           ▼                                        ▼
if(Any bit from 16 to 31 is one)       if(Any bit from 48 to 63 is one)
   |Because high order bits are Zero    |Note due to >> all high order
   |It check for bits numbered 16-31    |bits becomes zero.
   |                                    |2nd if checks for bit from 16-31
   |                                    |that were originally bit
   |                                    | numbered 48 to 63
   |                                    |
   ▼                                    ▼
Case False: if(0)                        Case False: if(0)
     All bits Zero from 16-31               All bits Zero from 49-63
     mask will NOT updated and              mask will NOT updated and
     byte = 6                                  byte = 2

Case True: if(!0)                         Case False: if(!0)
    Some bits are one from 16-31         Some bits are one from 49-63
    mask updated                            mask Updated
     byte = 4                                  byte = 0
   |                                        |
   |                                        |
   ▼                                        ▼
more four cases                          more four cases

Total 8 different answer outputs from `0` to `7`

因此，函数从左侧find_zero()搜索值为 0 的第一个字节。 N输出字节值可以是 from 0to7并且不考虑最右边的字节 ( "8")。
（*注意：long 是 64 位长 = 8 * 8 位 = 8 字节。*）

byte NUMBER ("):      
   "1"       "2"        "3"       "4"       "5"       "6"       "7"       "8"
63     56 55     48 47     40 39     32 31     24 23     16 15        7       0
▼       ▼ ▼       ▼ ▼       ▼ ▼       ▼ ▼       ▼ ▼       ▼ ▼         ▼       ▼
0000 1000 0000 0000 0000 0000 0000 0100 0000 0000 0000 0000 0000 0000 0001 0101
▲                                                                             ▲
MSB                                                                         LSM

byte = 0--> 表示第一个字节不为零
byte = 1--> 高位字节为零，位编号56为63
字节 = 2--> 两个高位字节为零，位编号48为63
字节 = 3--> 三个高位byte 是 0，即位编号40为63
:
:
类似地，byte = 7--> 左起七个字节为 0，（或全为 0）。

流程图

代码

下面我写了一个小代码，find_zero()用不同的值调用函数，将帮助你理解函数：

int main(){ 
 printf("\nmask=0x0, all bytes are zero, result =%ld", find_zero(0x0L)); // not prints 8
 printf("\nmask=0xff, left seven bytes are zero, result =%ld", find_zero(0xffL));
 printf("\nmask=0xffff, left six bytes are zero, result =%ld", find_zero(0xffffL));
 printf("\nmask=0xffffff, left five bytes are zero, result =%ld", find_zero(0xffffffL));
 printf("\nmask=0xffffffff, left four bytes are zero, result =%ld", find_zero(0xffffffffL));
 printf("\nmask=0xffffffffff, left three bytes are zero, result =%ld", find_zero(0xffffffffffL));
 printf("\nmask=0xffffffffffff, left two bytes are zero, result =%ld", find_zero(0xffffffffffffL));
 printf("\nmask=0xffffffffffffff, left most one byte is zero, result =%ld", find_zero(0xffffffffffffffL));
 printf("\nmask=0xffffffffffffffff, No byte is zero, result =%ld", find_zero(0xffffffffffffffffL));
 printf("\nmask=0x8000000000000000L, first byte is NOT zero (so no search), result =%ld", find_zero(0x8000000000000000LL));  
    printf("\n");
    return 1;
}

请观察输出以了解功能：

mask=0x0, all bytes are zero, result =7
mask=0xff, left seven bytes are zero, result =7
mask=0xffff, left six bytes are zero, result =6
mask=0xffffff, left five bytes are zero, result =5
mask=0xffffffff, left four bytes are zero, result =4
mask=0xffffffffff, left three bytes are zero, result =3
mask=0xffffffffffff, left two bytes are zero, result =2
mask=0xffffffffffffff, left most one byte is zero, result =1
mask=0xffffffffffffffff, No byte is zero, result =0
mask=0x8000000000000000L, first byte is NOT zero (so no search), result =0

注意：如果所有字节都为零，则打印的7不是 8。

score 0 · Accepted Answer

第一个函数在掩码中查找第一个不为空的字节，并从左侧开始返回其索引。

64 位示例，xx表示“任何字节”，“nn”表示“非零字节”：

.nn.xx.xx.xx.xx.xx.xx.xx -> find_zero() -> 0
.00.nn.xx.xx.xx.xx.xx.xx -> find_zero() -> 1
.00.00.nn.xx.xx.xx.xx.xx -> find_zero() -> 2
...
.00.00.00.00.00.00.00.nn -> find_zero() -> 7
.00.00.00.00.00.00.00.00 -> find_zero() -> 7 (this one is strange)

find_first_significant_byte_index()如果不是最后一个模棱两可的情况，我会命名这个函数......

第二个函数val根据位掩码进行拆分，将其“low_bits”存储在 *data 中以供进一步参考，并在 val 上返回一些奇怪的操作结果。（无法确定最后一个）。

score 0 · Accepted Answer

请参阅“一次单词界面”：https ://lwn.net/Articles/501492/

如果输入中有一个零字节，has_zero() 返回一个非零掩码。

find_zero() 查找第一个零使用掩码作为输入的位置（从技术上讲，它不是那个掩码，而是那个掩码被另外两个函数进一步按摩）。find_zero() 没有在输入掩码中找到零。

请参阅链接文章中的示例用法：

if (has_zero(value, &bits, &constants)) {
    bits = prep_zero_mask(value, bits, &constants);
    bits = create_zero_mask(bits);
    zero_byte = find_zero(bits);
    /* ... */

c - word_at_a_time.h 中的 has_zero 和 find_zero 是什么

3 回答 3

它能做什么？

它是怎么做的？

流程图

代码

Related

Reference