objective-c - 仅在 Objective-C 中使用内联汇编对变量进行 ROL / ROR

Question

几天前，我问了下面的问题。因为我需要一个快速的答案，所以我补充说：

代码不需要使用内联汇编。但是，我还没有找到使用 Objective-C / C++ / C 指令的方法。

今天，我想学点东西。所以我再次问这个问题，使用内联汇编寻找答案。

我想对 Objective-C 程序中的变量执行 ROR 和 ROL 操作。但是，我无法管理它——我不是装配专家。

这是我到目前为止所做的：

uint8_t v1 = ....;
uint8_t v2 = ....; // v2 is either 1, 2, 3, 4 or 5

asm("ROR v1, v2");

我得到的错误是：

未知大小后缀的指令助记符的未知使用

我怎样才能解决这个问题？

score 5 · Accepted Answer

旋转只是两个班次 - 一些位向左，其他位向右 - 一旦你看到这种旋转很容易，无需组装。该模式被一些编译器识别并使用旋转指令进行编译。有关代码，请参见维基百科。

更新：x86-64 上的 Xcode 4.6.2（其他未测试）编译双移位 + 或对 32 位和 64 位操作数进行旋转，对于 8 位和 16 位操作数，双移位 + 或被保留。为什么？也许编译器了解这些指令的性能，也许只是没有优化 - 但一般来说，如果你可以避免汇编器这样做，编译器总是最清楚！也可以使用static inline函数，或使用与标准宏相同的方式定义的宏MAX（宏具有适应其操作数类型的优点），可用于内联操作。

OP评论后的附录

这里以 i86_64 汇编器为例，详细了解如何使用asm构造 start here。

首先是非汇编版本：

static inline uint32 rotl32_i64(uint32 value, unsigned shift)
{
   // assume shift is in range 0..31 or subtraction would be wrong
   // however we know the compiler will spot the pattern and replace
   // the expression with a single roll and there will be no subtraction
   // so if the compiler changes this may break without:
   //    shift &= 0x1f;
   return (value << shift) | (value >> (32 - shift));
}

void test_rotl32(uint32 value, unsigned shift)
{
   uint32 shifted = rotl32_i64(value, shift);

   NSLog(@"%8x <<< %u -> %8x", value & 0xFFFFFFFF, shift, shifted & 0xFFFFFFFF);
}

如果您在 Xcode 中查看汇编器输出进行分析（因此优化器启动）（产品 > 生成输出 > 汇编文件，然后在窗口底部的弹出菜单中选择 Profiling），您将看到它rotl32_i64被内联到test_rotl32并编译成一个旋转（roll）指令。

现在，自己直接生成汇编程序比 FrankH 展示的 ARM 代码要复杂一些。cl这是因为要获取变量移位值，必须使用特定的寄存器 , ，因此我们需要为编译器提供足够的信息来执行此操作。开始：

static inline uint32 rotl32_i64_asm(uint32 value, unsigned shift)
{
   // i64 - shift must be in register cl so create a register local assigned to cl
   // no need to mask as i64 will do that
   register uint8 cl asm ( "cl" ) = shift;
   uint32 shifted;
   // emit the rotate left long
   // %n values are replaced by args:
   //    0: "=r" (shifted) - any register (r), result(=), store in var (shifted)
   //    1: "0" (value) - *same* register as %0 (0), load from var (value)
   //    2: "r" (cl) - any register (r), load from var (cl - which is the cl register so this one is used)
   __asm__ ("roll %2,%0" : "=r" (shifted) : "0" (value), "r" (cl));
   return shifted;
}

更改test_rotl32为再次调用rotl32_i64_asm并检查程序集输出 - 它应该是相同的，即编译器和我们一样。

进一步注意，如果包含注释掉的掩码行，rotl32_i64它本质上会变成rotl32- 编译器将为任何架构做正确的事情，所有这些都是andi64 版本中单个指令的成本。

那么asm您是否需要它，使用它可能会有些麻烦，并且编译器总是会自己做得更好或更好......

高温高压

score 0 · Accepted Answer

ARM 中的 32 位循环将是：

__asm__("MOV %0, %1, ROR %2\n" : "=r"(out) : "r"(in), "M"(N));

whereN需要是编译时常量。

但是桶形移位器的输出，无论是用于寄存器还是立即操作数，始终是全寄存器宽度。您可以将恒定的 8 位数量移动到 32 位字中的任何位置，或者 - 如此处 - 以任何方式移动/旋转 32 位寄存器中的值。
但是您不能使用单个 ARM 指令在寄存器中旋转 16 位或 8 位值。没有这样的存在。

这就是为什么编译器，在 ARM 目标上，当您使用“正常”（便携式 [Objective-]C/C++）代码(in << xx) | (in >> (w - xx))时，将为您创建一个用于 32 位循环的汇编指令，但至少有两个（一个正常移位，然后是移位或）对于 8/16 位的。

objective-c - 仅在 Objective-C 中使用内联汇编对变量进行 ROL / ROR

2 回答 2

Related

Reference