我不想局限于固定大小的整数并使用硬编码常量制作类似命令的列表,因此我开发了一个 C++11 解决方案,它利用模板元编程来生成函数和常量。在不使用 BMI 的情况下生成的汇编代码-O3
看起来尽可能紧凑:
andl $0x55555555, %eax
movl %eax, %ecx
shrl %ecx
orl %eax, %ecx
andl $0x33333333, %ecx
movl %ecx, %eax
shrl $2, %eax
orl %ecx, %eax
andl $0xF0F0F0F, %eax
movl %eax, %ecx
shrl $4, %ecx
orl %eax, %ecx
movzbl %cl, %esi
shrl $8, %ecx
andl $0xFF00, %ecx
orl %ecx, %esi
TL;DR 源代码库和现场演示。
执行
基本上,morton1
函数中的每一步都是通过移动和添加一系列常量来工作的,如下所示:
0b0101010101010101
(备用 1 和 0)
0b0011001100110011
(交替 2x 1 和 0)
0b0000111100001111
(交替 4x 1 和 0)
0b0000000011111111
(交替 8x 1 和 0)
如果我们要使用D
维度,我们将有一个由D-1
0 和1
1 组成的模式。因此,要生成这些就足以生成连续的并按位应用一些或:
/// @brief Generates 0b1...1 with @tparam n ones
template <class T, unsigned n>
using n_ones = std::integral_constant<T, (~static_cast<T>(0) >> (sizeof(T) * 8 - n))>;
/// @brief Performs `@tparam input | (@tparam input << @tparam width` @tparam repeat times.
template <class T, T input, unsigned width, unsigned repeat>
struct lshift_add :
public lshift_add<T, lshift_add<T, input, width, 1>::value, width, repeat - 1> {
};
/// @brief Specialization for 1 repetition, just does the shift-and-add operation.
template <class T, T input, unsigned width>
struct lshift_add<T, input, width, 1> : public std::integral_constant<T,
(input & n_ones<T, width>::value) | (input << (width < sizeof(T) * 8 ? width : 0))> {
};
现在我们可以在编译时为任意维度生成常量,如下所示:
template <class T, unsigned step, unsigned dimensions = 2u>
using mask = lshift_add<T, n_ones<T, 1 << step>::value, dimensions * (1 << step), sizeof(T) * 8 / (2 << step)>;
使用相同类型的递归,我们可以为算法的每个步骤生成函数x = (x | (x >> K)) & M
:
template <class T, unsigned step, unsigned dimensions>
struct deinterleave {
static T work(T input) {
input = deinterleave<T, step - 1, dimensions>::work(input);
return (input | (input >> ((dimensions - 1) * (1 << (step - 1))))) & mask<T, step, dimensions>::value;
}
};
// Omitted specialization for step 0, where there is just a bitwise and
仍然需要回答“我们需要多少步骤?”这个问题。这也取决于维度的数量。通常,k
步骤计算2^k - 1
输出位;每个维度的最大有意义的位数由 给出z = sizeof(T) * 8 / dimensions
,因此采取1 + log_2 z
步骤就足够了。现在的问题是我们需要它constexpr
才能将其用作模板参数。我发现解决此问题的最佳方法是log2
通过元编程进行定义:
template <unsigned arg>
struct log2 : public std::integral_constant<unsigned, log2<(arg >> 1)>::value + 1> {};
template <>
struct log2<1u> : public std::integral_constant<unsigned, 0u> {};
/// @brief Helper constexpr which returns the number of steps needed to fully interleave a type @tparam T.
template <class T, unsigned dimensions>
using num_steps = std::integral_constant<unsigned, log2<sizeof(T) * 8 / dimensions>::value + 1>;
最后,我们可以执行一次调用:
/// @brief Helper function which combines @see deinterleave and @see num_steps into a single call.
template <class T, unsigned dimensions>
T deinterleave_first(T n) {
return deinterleave<T, num_steps<T, dimensions>::value - 1, dimensions>::work(n);
}