我有大量的 N 维点(数千万;N 接近 100)。



计算不必是即时的,但我希望在体面的现代家用 PC 硬件上不会超过几个小时。



6 回答 6


我终于崩溃了,掏出一些钱。AIP(美国物理研究所)有一篇不错的简短文章,其中包含 C 语言源代码。John Skilling 的“编程希尔伯特曲线”(来自 AIP Conf. Proc. 707, 381 (2004))有一个附录,其中包含代码两个方向的映射。它适用于任何数量大于 1 的维度,不是递归的,不使用会占用大量内存的状态转换查找表,并且主要使用位操作。因此它相当快并且具有良好的内存占用。


以下代码行(在函数 TransposetoAxes 中找到)有错误:

for( i = n-1; i >= 0; i-- ) X[i] ^= X[i-1];

更正是将大于或等于 (>=) 更改为大于 (>)。如果没有这种更正,当变量“i”变为零时,使用负索引访问 X 数组,导致程序失败。


我将他的代码翻译成 C# 供我自己使用。代码如下。Skilling 执行转换,覆盖您传入的向量。我选择克隆输入向量并返回一个新副本。此外,我将这些方法实现为扩展方法。

Skilling 的代码将希尔伯特索引表示为转置,存储为数组。我发现交错位并形成单个 BigInteger 更方便(在字典中更有用,更容易在循环中迭代等),但我优化了该操作及其与幻数、位操作等的逆运算,以及代码很长,所以我省略了。

namespace HilbertExtensions
    /// <summary>
    /// Convert between Hilbert index and N-dimensional points.
    /// The Hilbert index is expressed as an array of transposed bits. 
    /// Example: 5 bits for each of n=3 coordinates.
    /// 15-bit Hilbert integer = A B C D E F G H I J K L M N O is stored
    /// as its Transpose                        ^
    /// X[0] = A D G J M                    X[2]|  7
    /// X[1] = B E H K N        <------->       | /X[1]
    /// X[2] = C F I L O                   axes |/
    ///        high low                         0------> X[0]
    /// NOTE: This algorithm is derived from work done by John Skilling and published in "Programming the Hilbert curve".
    /// (c) 2004 American Institute of Physics.
    /// </summary>
    public static class HilbertCurveTransform
        /// <summary>
        /// Convert the Hilbert index into an N-dimensional point expressed as a vector of uints.
        /// Note: In Skilling's paper, this function is named TransposetoAxes.
        /// </summary>
        /// <param name="transposedIndex">The Hilbert index stored in transposed form.</param>
        /// <param name="bits">Number of bits per coordinate.</param>
        /// <returns>Coordinate vector.</returns>
        public static uint[] HilbertAxes(this uint[] transposedIndex, int bits)
            var X = (uint[])transposedIndex.Clone();
            int n = X.Length; // n: Number of dimensions
            uint N = 2U << (bits - 1), P, Q, t;
            int i;
            // Gray decode by H ^ (H/2)
            t = X[n - 1] >> 1;
            // Corrected error in Skilling's paper on the following line. The appendix had i >= 0 leading to negative array index.
            for (i = n - 1; i > 0; i--) 
                X[i] ^= X[i - 1];
            X[0] ^= t;
            // Undo excess work
            for (Q = 2; Q != N; Q <<= 1)
                P = Q - 1;
                for (i = n - 1; i >= 0; i--)
                    if ((X[i] & Q) != 0U)
                        X[0] ^= P; // invert
                        t = (X[0] ^ X[i]) & P;
                        X[0] ^= t;
                        X[i] ^= t;
            } // exchange
            return X;

        /// <summary>
        /// Given the axes (coordinates) of a point in N-Dimensional space, find the distance to that point along the Hilbert curve.
        /// That distance will be transposed; broken into pieces and distributed into an array.
        /// The number of dimensions is the length of the hilbertAxes array.
        /// Note: In Skilling's paper, this function is called AxestoTranspose.
        /// </summary>
        /// <param name="hilbertAxes">Point in N-space.</param>
        /// <param name="bits">Depth of the Hilbert curve. If bits is one, this is the top-level Hilbert curve.</param>
        /// <returns>The Hilbert distance (or index) as a transposed Hilbert index.</returns>
        public static uint[] HilbertIndexTransposed(this uint[] hilbertAxes, int bits)
            var X = (uint[])hilbertAxes.Clone();
            var n = hilbertAxes.Length; // n: Number of dimensions
            uint M = 1U << (bits - 1), P, Q, t;
            int i;
            // Inverse undo
            for (Q = M; Q > 1; Q >>= 1)
                P = Q - 1;
                for (i = 0; i < n; i++)
                    if ((X[i] & Q) != 0)
                        X[0] ^= P; // invert
                        t = (X[0] ^ X[i]) & P;
                        X[0] ^= t;
                        X[i] ^= t;
            } // exchange
            // Gray encode
            for (i = 1; i < n; i++)
                X[i] ^= X[i - 1];
            t = 0;
            for (Q = M; Q > 1; Q >>= 1)
                if ((X[n - 1] & Q)!=0)
                    t ^= Q - 1;
            for (i = 0; i < n; i++)
                X[i] ^= t;

            return X;


我已将 C# 中的工作代码发布到 github。


更新:我刚刚在 crates.io 上发布了一个名为“hilbert”的 Rust crate(2019 年秋季)。它还使用斯基林算法。见https://crates.io/crates/hilbert

于 2012-04-30T12:57:52.293 回答

此处给出的从 n->1 和 1->n 映射的算法 “使用希尔伯特空间填充曲线计算一维和 n 维值之间的映射”JK Lawder

如果您在 Google 上搜索“SFC 模块和 Kademlia 覆盖”,您会发现一个声称将其用作系统一部分的组。如果您查看源代码,您可能可以提取相关功能。

于 2009-08-14T22:40:46.347 回答

我不清楚这将如何做你想要的。考虑这个简单的 3D 案例:

001 ------ 101
 |\         |\
 | \        | \
 |  011 ------ 111
 |   |      |   |
 |   |      |   |
000 -|---- 100  |
  \  |       \  |
   \ |        \ |
    010 ------ 110


001 -----> 101
  \          \
   \          \
    011        111
     ^          |
     |          |
000  |     100  |
  \  |       \  |
   \ |        \ V
    010        110


000 -> 010 -> 011 -> 001 -> 101 -> 111 -> 110 -> 100


000 : 100 -> 7
010 : 110 -> 5
011 : 111 -> 3
001 : 101 -> 1

在所有情况下,左侧和右侧的值都是相同的 3D 距离(第一个位置为 +/- 1),这似乎意味着相似的“空间局部性”。但是通过任何选择的维度排序(在上面的例子中是 y,然后是 z,然后是 z)进行线性化会破坏该局部性。


1D ordering : distance    3D ordering : distance
----------------------    ----------------------
        010 : 1           001,010,100 : 1
                          011,101,110 : sqrt(2)
                              111     : sqrt(3)
        011 : 2
        001 : 3
        101 : 4
        111 : 5
        110 : 6
        100 : 7


于 2009-01-31T19:05:18.367 回答

我花了一点时间将 Paul Chernoch 的代码翻译成 Java 并清理它。我的代码中可能存在错误,尤其是因为我无法访问它最初来自的论文。但是,它通过了我能够编写的单元测试。它在下面。

请注意,我已经评估了Z-Order和 Hilbert 曲线在较大数据集上的空间索引。我不得不说 Z-Order 提供了更好的质量。但请随时为自己尝试。

     * Convert the Hilbert index into an N-dimensional point expressed as a vector of uints.
     * Note: In Skilling's paper, this function is named TransposetoAxes.
     * @param transposedIndex The Hilbert index stored in transposed form.
     * @param bits Number of bits per coordinate.
     * @return Point in N-space.
    static long[] HilbertAxes(final long[] transposedIndex, final int bits) {
        final long[] result = transposedIndex.clone();
        final int dims = result.length;
        grayDecode(result, dims);
        undoExcessWork(result, dims, bits);
        return result;

    static void grayDecode(final long[] result, final int dims) {
        final long swap = result[dims - 1] >>> 1;
        // Corrected error in Skilling's paper on the following line. The appendix had i >= 0 leading to negative array index.
        for (int i = dims - 1; i > 0; i--)
            result[i] ^= result[i - 1];
        result[0] ^= swap;

    static void undoExcessWork(final long[] result, final int dims, final int bits) {
        for (long bit = 2, n = 1; n != bits; bit <<= 1, ++n) {
            final long mask = bit - 1;
            for (int i = dims - 1; i >= 0; i--)
                if ((result[i] & bit) != 0)
                    result[0] ^= mask; // invert
                    swapBits(result, mask, i);

     * Given the axes (coordinates) of a point in N-Dimensional space, find the distance to that point along the Hilbert curve.
     * That distance will be transposed; broken into pieces and distributed into an array.
     * The number of dimensions is the length of the hilbertAxes array.
     * Note: In Skilling's paper, this function is called AxestoTranspose.
     * @param hilbertAxes Point in N-space.
     * @param bits Depth of the Hilbert curve. If bits is one, this is the top-level Hilbert curve.
     * @return The Hilbert distance (or index) as a transposed Hilbert index.
    static long[] HilbertIndexTransposed(final long[] hilbertAxes, final int bits) {
        final long[] result = hilbertAxes.clone();
        final int dims = hilbertAxes.length;
        final long maxBit = 1L << (bits - 1);
        inverseUndo(result, dims, maxBit);
        grayEncode(result, dims, maxBit);
        return result;

    static void inverseUndo(final long[] result, final int dims, final long maxBit) {
        for (long bit = maxBit; bit != 0; bit >>>= 1) {
            final long mask = bit - 1;
            for (int i = 0; i < dims; i++)
                if ((result[i] & bit) != 0)
                    result[0] ^= mask; // invert
                    swapBits(result, mask, i);
        } // exchange

    static void grayEncode(final long[] result, final int dims, final long maxBit) {
        for (int i = 1; i < dims; i++)
            result[i] ^= result[i - 1];
        long mask = 0;
        for (long bit = maxBit; bit != 0; bit >>>= 1)
            if ((result[dims - 1] & bit) != 0)
                mask ^= bit - 1;
        for (int i = 0; i < dims; i++)
            result[i] ^= mask;

    static void swapBits(final long[] array, final long mask, final int index) {
        final long swap = (array[0] ^ array[index]) & mask;
        array[0] ^= swap;
        array[index] ^= swap;
于 2016-07-12T22:13:45.010 回答


于 2009-01-31T17:48:51.623 回答




编辑:感谢您的评论。我现在明白你所说的希尔伯特曲线方法是什么意思了。然而,这是一个难题,鉴于 N=100 和 1000 万个数据点,我认为任何方法都不能很好地保持局部性并在合理的时间内运行。我认为 kd-trees 不会在这里工作。


于 2009-01-31T17:35:13.587 回答