python - 压缩一个非常大的数字（在 Python 中）

Question

我需要压缩一个非常非常大的数字（1.43 亿位）。我正在寻找一种解决方案，可以在不损失至少 10% 的情况下对其进行压缩。我试过 zlib、zipfile、gzip 等等，但没有一个能真正压缩这个数字。所以这是我的一个想法，但问题是我不知道如何实现它

首先，我有号码。

234512

然后我必须将其拆分为小于 256 的数字块。

234,51,2

如果大小是固定的（例如，总是 3 位），我可以拆分它，但每个块可能有 1,2 或 3 位，所以我被困在这里。

在我得到小于 256 的数字块后，我会将它们转换为字符并写入文件。

编辑：由于使用这种方法我会丢失着陆零，我创建了一个压缩约 50% 数字大小的算法：

由于我只有 0-9 个数字作为数字，我可以说它们是十六进制的（尽管它们不是）并转换为以 10 为底，从而减小了它的大小。编辑2：跳过这一步。实际上，这样做只会增加它的大小！
我会得到一个以 0-9 为数字的较小数字，然后我可以再次假设它们是十六进制的。因此，使用 unhexlify 将其变成很多字节，它们是大小的一半！（如果它是奇数长度，请在数字的附加部分添加“a”）

编码：

if len(o)%2: o+='a' #avoid odd-length
return unhexlify(o)

我什至可以用 zlib 压缩返回数据。压缩比总计45%。

score 1 · Accepted Answer

开始：

#! /usr/bin/python

n = 313105074639950943116 #just an example

#your algorithm
chars = []
buff = ''
s = str (n)
while s:
    if int (buff + s [0] ) < 256:
        buff += s [0]
        s = s [1:]
    else:
        chars.append (int (buff) )
        buff = ''
if buff: chars.append (int (buff) )

print ('You need to write these numbers converted to chars: {}'.format (chars) )
print ('This are {} bytes of data.'.format (len (chars) ) )
print ('But you cannot decompress it, because you lose leading zeros.')

chars = []
while n:
    chars.append (n & 0xff)
    n = n >> 8

print ('Now if you just write the number to a file without your algorithm:')
print ('You need to write these numbers converted to chars: {}'.format (chars) )
print ('This are {} bytes of data.'.format (len (chars) ) )
print ('And you can actually read it again.')

编辑：如果您的数字的十进制表示有很多 6s 和 8s 序列，您应该尝试使用十进制表示的 RLE，可能与 Huffman 树结合使用。

编辑 2：考虑到 (a) 6s 和 8s 的长时间运行，以及 (b) 你不想使用一些已建立的算法的事实，你可以使用一些非常粗糙的 RLE，如下所示：

#! /usr/bin/python

n = 313666666666666688888888888888888866666666666666666666666666666610507466666666666666666666666666399509431888888888888888888888888888888888888888888881666666666666

s = str (n)
print (s)
comp = ''
count = None
while s:
    if s [0] in '01234579':
        if count:
            comp += ('<{}>' if count [0] == 6 else '[{}]').format (count [1] )
            count = None
        comp += s [0]
    if s [0] == '6':
        if count and count [0] == 6: count = (6, count [1] + 1)
        elif count:
            comp += ('[{}]').format (count [1] )
            count = (6, 1)
        else: count = (6, 1)
    if s [0] == '8':
        if count and count [0] == 8: count = (8, count [1] + 1)
        elif count:
            comp += ('<{}>').format (count [1] )
            count = (8, 1)
        else: count = (8, 1)
    s = s [1:]

if count: comp += ('<{}>' if count [0] == 6 else '[{}]').format (count [1] )

print (comp)

python - 压缩一个非常大的数字（在 Python 中）

1 回答 1

Related

Reference