performance - 按角度对向量进行排序而不实际计算该角度的最快方法

Question

许多算法（例如格雷厄姆扫描）要求点或向量按它们的角度排序（也许从其他点看，即使用差异向量）。这个顺序本质上是循环的，并且这个循环被打破以计算线性值通常并不重要。但实际角度值也无关紧要，只要保持循环顺序即可。因此，atan2对每一点都进行调用可能是一种浪费。有什么更快的方法来计算角度严格单调的值，方法atan2是？这些函数显然被某些人称为“伪角度”。

score 18 · Accepted Answer

I started to play around with this and realised that the spec is kind of incomplete. atan2 has a discontinuity, because as dx and dy are varied, there's a point where atan2 will jump between -pi and +pi. The graph below shows the two formulas suggested by @MvG, and in fact they both have the discontinuity in a different place compared to atan2. (NB: I added 3 to the first formula and 4 to the alternative so that the lines don't overlap on the graph). If I added atan2 to that graph then it would be the straight line y=x. So it seems to me that there could be various answers, depending on where one wants to put the discontinuity. If one really wants to replicate atan2, the answer (in this genre) would be

# Input:  dx, dy: coordinates of a (difference) vector.
# Output: a number from the range [-2 .. 2] which is monotonic
#         in the angle this vector makes against the x axis.
#         and with the same discontinuity as atan2
def pseudoangle(dx, dy):
    p = dx/(abs(dx)+abs(dy)) # -1 .. 1 increasing with x
    if dy < 0: return p - 1  # -2 .. 0 increasing with x
    else:      return 1 - p  #  0 .. 2 decreasing with x

This means that if the language that you're using has a sign function, you could avoid branching by returning sign(dy)(1-p), which has the effect of putting an answer of 0 at the discontinuity between returning -2 and +2. And the same trick would work with @MvG's original methodology, one could return sign(dx)(p-1).

Update In a comment below, @MvG suggests a one-line C implementation of this, namely

pseudoangle = copysign(1. - dx/(fabs(dx)+fabs(dy)),dy)

@MvG says it works well, and it looks good to me :-).

enter image description here

score 5 · Accepted Answer

我知道一种可能的这样的功能，我将在这里描述。

# Input:  dx, dy: coordinates of a (difference) vector.
# Output: a number from the range [-1 .. 3] (or [0 .. 4] with the comment enabled)
#         which is monotonic in the angle this vector makes against the x axis.
def pseudoangle(dx, dy):
    ax = abs(dx)
    ay = abs(dy)
    p = dy/(ax+ay)
    if dx < 0: p = 2 - p
    # elif dy < 0: p = 4 + p
    return p

那么为什么会这样呢？需要注意的一件事是缩放所有输入长度不会影响输出。所以向量的长度(dx, dy)无关紧要，只有它的方向很重要。专注于第一象限，我们暂时可以假设dx == 1。然后dy/(1+dy)从零单调地增长dy == 0到一为无限dy（即对于dx == 0）。现在其他象限也必须处理。如果dy是负数，那么初始也是p。所以对于正数dx，我们已经有一个-1 <= p <= 1角度单调的范围。因为dx < 0我们更改符号并添加两个。这给出了的范围1 <= p <= 3，dx < 0以及-1 <= p <= 3整体的范围。如果负数由于某种原因不受欢迎，则elif可以包含注释行，这会将第四象限从移动-1…0到3…4。

我不知道上面的函数是否有一个既定的名字，可能是谁先发布的。我很久以前就得到了它，并将它从一个项目复制到下一个项目。然而，我在网络上发现了这种情况，所以我认为这个被截断的公开内容足以重复使用。

有一种方法可以获得范围 [0 … 4]（对于实角 [0 … 2π]），而不需要进一步区分大小写：

# Input:  dx, dy: coordinates of a (difference) vector.
# Output: a number from the range [0 .. 4] which is monotonic
#         in the angle this vector makes against the x axis.
def pseudoangle(dx, dy):
    p = dx/(abs(dx)+abs(dy)) # -1 .. 1 increasing with x
    if dy < 0: return 3 + p  #  2 .. 4 increasing with x
    else:      return 1 - p  #  0 .. 2 decreasing with x

score 3 · Accepted Answer

我有点喜欢三角学，所以我知道将角度映射到我们通常拥有的某些值的最佳方法是切线。当然，如果我们想要一个有限的数字来避免比较 {sign(x),y/x} 的麻烦，它会变得更加混乱。

但是有一个函数可以将 [1,+inf[ 映射到 [1,0[ 称为逆函数，这将允许我们有一个有限的范围来映射角度。正切的倒数是众所周知的余切，因此是 x/y（是的，就这么简单）。

一个小插图，显示单位圆上正切和余切的值：

正切值和余切值

您会看到 |x| 时的值相同。= |y|，您还可以看到，如果我们对两个圆圈上输出值介于 [-1,1] 之间的部分进行着色，我们就可以为整个圆圈着色。为了让这个值的映射是连续的和单调的，我们可以这样做：

使用余切的对立面与切线具有相同的单调性
将 2 添加到 -cotan，使值在 tan=1 处重合
将 4 添加到圆的一半（例如，在 x=-y 对角线下方）以使值适合不连续点之一。

这给出了以下分段函数，它是角度的连续且单调的函数，只有一个不连续性（这是最小的）：

角度的连续分段函数为 (-1,8)

double pseudoangle(double dx, double dy) 
{
    // 1 for above, 0 for below the diagonal/anti-diagonal
    int diag = dx > dy; 
    int adiag = dx > -dy;

    double r = !adiag ? 4 : 0;

    if (dy == 0)
        return r;

    if (diag ^ adiag)
        r += 2 - dx / dy; 
    else
        r += dy / dx; 

    return r;
}

请注意，这与Fowler 角度非常接近，具有相同的属性。正式地，pseudoangle(dx,dy) + 1 % 8 == Fowler(dx,dy)

谈到性能，它比 Fowler 的代码要少得多（而且通常也不那么复杂 imo）。在 gcc 6.1.1 上编译-O3，上面的函数生成一个带有 4 个分支的汇编代码，其中两个来自dy == 0（一个检查两个操作数是否“无序”，因此如果 dy 是NaN，另一个检查它们是否相等）。

我认为这个版本比其他版本更精确，因为它只使用尾数保留操作，直到将结果转移到正确的间隔。这在 |x| 时尤其明显。<< |y| 或 |y| >> |x|，则运算 |x| + |y| 失去了相当多的精度。

正如您在图表中看到的，角度-伪角度关系也非常接近线性。

查看分支来自哪里，我们可以做以下评论：

我的代码不依赖absnor copysign，这使它看起来更加独立。然而，在浮点值上使用符号位实际上是微不足道的，因为它只是翻转一个单独的位（没有分支！），所以这更像是一个缺点。
此外，这里提出的其他解决方案在除以它之前不检查是否abs(dx) + abs(dy) == 0，但是一旦只有一个组件（dy）为0，这个版本就会失败——这样就会抛出一个分支（或者在我的例子中是2）。

如果我们选择得到大致相同的结果（直到舍入错误）但没有分支，我们可能会滥用 copsign 并编写：

double pseudoangle(double dx, double dy) 
{
    double s = dx + dy; 
    double d = dx - dy; 
    double r = 2 * (1.0 - copysign(1.0, s)); 
    double xor_sign = copysign(1.0, d) * copysign(1.0, s); 

    r += (1.0 - xor_sign);
    r += (s - xor_sign * d) / (d + xor_sign * s);

    return r;
}

如果 dx 和 dy 的绝对值接近，则由于 d 或 s 中的取消，可能会发生比以前的实现更大的错误。没有检查除以零与其他实现相比较，因为这只发生在 dx 和 dy 都为 0 时。

score 2 · Accepted Answer

如果您可以在排序时将原始向量而不是角度输入到比较函数中，则可以使其与：

只有一个分支。
只有浮点比较和乘法。

避免加法和减法使其在数值上更加稳健。double 实际上总是可以准确地表示两个浮点数的乘积，但不一定是它们的总和。这意味着对于单精度输入，您可以毫不费力地保证完美无瑕的结果。

这基本上是对两个向量重复的Cimbali 解决方案，消除了分支，并增加了除法。它返回一个整数，符号与比较结果匹配（正、负或零）：

signed int compare(double x1, double y1, double x2, double y2) {
    unsigned int d1 = x1 > y1;
    unsigned int d2 = x2 > y2;
    unsigned int a1 = x1 > -y1;
    unsigned int a2 = x2 > -y2;

    // Quotients of both angles.
    unsigned int qa = d1 * 2 + a1;
    unsigned int qb = d2 * 2 + a2;

    if(qa != qb) return((0x6c >> qa * 2 & 6) - (0x6c >> qb * 2 & 6));

    d1 ^= a1;

    double p = x1 * y2;
    double q = x2 * y1;

    // Numerator of each remainder, multiplied by denominator of the other.
    double na = q * (1 - d1) - p * d1;
    double nb = p * (1 - d1) - q * d1;

    // Return signum(na - nb)
    return((na > nb) - (na < nb));
}

score 1 · Accepted Answer

我想出的最简单的方法是制作点的标准化副本，并沿 x 或 y 轴将围绕它们的圆圈分成两半。然后使用相反的轴作为顶部或底部缓冲区的开始和结束之间的线性值（一个缓冲区在放入时需要以相反的线性顺序。）然后你可以线性读取第一个然后第二个缓冲区，它会顺时针方向，或者逆时针方向的倒数第二和第一。

这可能不是一个很好的解释，所以我在 GitHub 上放了一些代码，使用这种方法对带有 epsilion 值的点进行排序，以调整数组的大小。

https://github.com/Phobos001/SpatialSort2D

这可能不适合您的用例，因为它是为图形效果渲染性能而构建的，但它快速且简单（O(N) 复杂度）。如果您使用非常小的点变化或非常大（数十万）的数据集，那么这将不起作用，因为内存使用可能超过性能优势。

score 0 · Accepted Answer

不错..这是一个返回 -Pi 的变体， Pi 像许多 arctan2 函数一样。

编辑说明：将我的伪代码更改为正确的 python.. arg 顺序更改以与 pythons 数学模块 atan2() 兼容。Edit2 需要更多代码来捕获 dx=0 的情况。

def pseudoangle( dy , dx ):
  """ returns approximation to math.atan2(dy,dx)*2/pi"""
  if dx == 0 :
      s = cmp(dy,0)
  else::
      s = cmp(dx*dy,0)  # cmp == "sign" in many other languages.
  if s == 0 : return 0 # doesnt hurt performance much.but can omit if 0,0 never happens
  p = dy/(dx+s*dy)
  if dx < 0: return p-2*s
  return  p

在这种形式中，所有角度的最大误差仅为 ~0.07 弧度。（如果您不关心大小，当然可以忽略 Pi/2。）

现在是坏消息——在我的系统上使用 python math.atan2 大约快 25% 显然替换一个简单的解释代码并没有击败一个编译的内部代码。

score 0 · Accepted Answer

如果角度本身不需要，而仅用于排序，那么@jjrv 方法是最好的方法。这是Julia中的比较

using StableRNGs
using BenchmarkTools

# Definitions
struct V{T}
  x::T
  y::T
end

function pseudoangle(v)
    copysign(1. - v.x/(abs(v.x)+abs(v.y)), v.y)
end

function isangleless(v1, v2)
    a1 = abs(v1.x) + abs(v1.y)
    a2 = abs(v2.x) + abs(v2.y)

    a2*copysign(a1 - v1.x, v1.y) < a1*copysign(a2 - v2.x, v2.y)
end

# Data
rng = StableRNG(2021)
vectors = map(x -> V(x...), zip(rand(rng, 1000), rand(rng, 1000)))

# Comparison
res1 = sort(vectors, by = x -> pseudoangle(x));
res2 = sort(vectors, lt = (x, y) -> isangleless(x, y));

@assert res1 == res2

@btime sort($vectors, by = x -> pseudoangle(x));
  # 110.437 μs (3 allocations: 23.70 KiB)

@btime sort($vectors, lt = (x, y) -> isangleless(x, y));
  # 65.703 μs (3 allocations: 23.70 KiB)

因此，通过避免除法，时间几乎减半，而不会损失结果质量。当然，为了更精确的计算，应该不时isangleless配备，但同样可以告知一下。bigfloatpseudoangle

score -3 · Accepted Answer

只需使用叉积函数。相对于另一段旋转的方向将给出正数或负数。没有三角函数，也没有除法。快速简单。只需谷歌它。

performance - 按角度对向量进行排序而不实际计算该角度的最快方法

8 回答 8

Related

Reference