python - Python 压缩运行长度编码

Question

我正在尝试了解运行长度编码，但我在网上发现了这个我做不到的挑战。它要求您编写一个名为 compression(strg) 的压缩函数，它将长度为 64 的二进制字符串 strg 作为输入并返回另一个二进制字符串作为输出。输出二进制字符串应该是输入字符串的游程编码。

压缩（'1010101001010101101010100101010110101010010101011010101001010101'）

'1010101001010101*4'

这是我所拥有的，但这没有找到模式：

from itertools import *

def compression(strg):
    return [(len(list(group)),name) for name, group in groupby(strg)]

我需要一些帮助来解决这个问题。

score 5 · Accepted Answer

我相信您将 RLE 与 Lempel/Ziv 滑动窗口压缩混为一谈。

RLE 严格处理重复字符： WWWWWWWW=>W8

LZ有一个滑动窗口，可以根据您的描述拾取模式。

David MacKay 的网站有 Python 中的示例压缩代码，包括 LZ

score 0 · Accepted Answer

This is an example of a longest repeated substring problem. It is classically solved with a suffix tree data structure.

For short strings, you can use a form of a regex:

import re

s1='1010101001010101101010100101010110101010010101011010101001010101'

i=2
l=s1
j=len(l)/2
while i<len(s1):
    m=re.search('^(.{'+str(j)+'})\\1$',l)
    if m:
        l=m.group(1)
        i,j=i+1,len(l)/2
        continue
    else:
        print '{0} * {1} = {2}'.format(l,i,s1)
        break

Prints your output. Note this only works for strings that have complete symmetry from the middle -- a small subset of this type of problem. To compress other types of strings, you would need a representational grammar of how the replaced elements are being substituted.

score 0 · Accepted Answer

这个问题的答案和详细解释在以下链接中给出：

使用 run-length codig 通过 def compress(S) 函数进行图像压缩

希望它能让您清楚地了解字符串和二进制压缩的运行长度编码。此编码是在不使用任何 re 和 itertools 的情况下完成的。

python - Python 压缩运行长度编码

3 回答 3

Related

Reference