0

我有一个以000, 001, 010, 000随机形式生成二进制输出字符串的源。

我想要一些散列或聚类程序,根据与其他输入不同的位数将输入分组,例如输入流 000、001 和 010 应该都进入同一个桶/集群,因为它们相差一点点。

我最初的想法是将输入的第一个连续位分组为一个,例如来自

000
001
010

合而为一。然后是下一个:

011
100
101

ETC

但是我很快意识到边界之间有相似之处,比如0001000应该属于同一个桶,而应该属于011不同000的桶。

我怎么能接近这个?提示?

詹姆士

4

2 回答 2

0

Use gray codes.

https://en.wikipedia.org/wiki/Gray_code

The reflected binary code, also known as Gray code after Frank Gray, is a binary numeral system where two successive values differ in only one bit.

This results in the following order for three bits (and the associated integers - this is not the default binary encoding!):

000  = 0
001  = 1
011  = 2
010  = 3
110  = 4
111  = 5
101  = 6
100  = 7

And then just split the data in the desired number of buckets linearly.

Alternatively, you may want to look at checksums and error correcting codes, too.

于 2012-12-17T16:18:29.347 回答
0

抱歉,对于您的特殊情况,我没有找到一个功能,但在这里您可以获得一些灵感:

如果您想拥有一个依赖于一个数字的存储桶函数,您可以执行以下操作:

def bucket(i):
    bucket = 0
    bit = 1
    while bit <= i:
        if bit & i > 0:
            bucket += 1
        bit <<= 1 # shift the bit 1 higher
    return bucket

如果您有两个数字依赖:

def bucket(i, x):
    # perform some operation, store it in i
    bucket = 0
    bit = 1
    while bit <= i:
        # you can compare bits here
        if bit & i > 0:
            bucket += 1
    bit <<= 1
    return bucket

这就是您可以进行按位分桶的方法

buckets = {}
for number in numbers:
    numberBucket = bucket(number)
    buckets.setdefault(numberBucket, [])
    buckets[numberBucket].append(number)
于 2012-12-17T16:32:56.020 回答