0

像这样

text = "  \t  hello there\n  \t  how are you?\n  \t HHHH"
      hello there
      how are you?
     HHHH

我可以通过正则表达式获取公共前缀子字符串吗?

我试着

In [36]: re.findall(r"(?m)(?:(^[ \t]+).+[\n\r]+\1)", "  \t  hello there\n  \t  how are you?\n  \t HHHH")
Out[36]: ['  \t  ']

但显然,公共前缀子字符串是 ' \t '
我想用于dedent像 python textwrap 模块这样的函数。

4

4 回答 4

1

这是一个在文本中查找公共前缀的表达式:

r'^(.+).*(\n\1.*)*$'

例子:

import re

text = (
    "No Red Leicester\n"
    "No Tilsit\n"
    "No Red Windsor"
)

m = re.match(r'^(.+).*(\n\1.*)*$', text)
if m:
    print 'common prefix is', m.group(1)
else:
    print 'no common prefix'

请注意,此表达式涉及大量回溯,因此请明智地使用它,尤其是在大输入时。

要找出最长的公共“空格”前缀,只需找到它们并应用len

def dedent(text):
    prefix_len = min(map(len, re.findall('(?m)^\s+', text)))
    return re.sub(r'(?m)^.{%d}' % prefix_len, '', text)

text = (
    "     No Red Leicester\n"
    "    No Tilsit\n"
    "\t\t   No Red Windsor"
)

print dedent(text)
于 2012-11-03T10:40:29.720 回答
1

我建议

match = re.search(r'(?m)\A(.*).*(?:\n?^\1.*$)*\n?\Z', text)

请参阅此演示

于 2012-11-03T13:25:38.110 回答
0

我对 Python 不是很好,所以,也许这段代码看起来不适合该语言,但从算法上讲,它应该很好:

>>> import StringIO
...
>>> def strip_common_prefix(text):
...     position = text.find('\n')
...     offset = position
...     match = text[: position + 1]
...     lines = [match]
...     while match and position != len(text):
...         next_line = text.find('\n', position + 1)
...         if next_line == -1: next_line = len(text)
...         line = text[position + 1 : next_line + 1]
...         position = next_line
...         lines.append(line)
...         i = 0
...         for a, b in zip(line, match):
...             if i > offset or a != b: break
...             i += 1
...         offset = i
...         match = line[: offset]
...     buf = StringIO.StringIO()
...     for line in lines:
...         if not match: buf.write(line)
...         else: buf.write(line[offset :])
...     text = buf.getvalue()
...     buf.close()
...     return text
... 
>>> strip_common_prefix("  \t  hello there\n  \t  how are you?\n  \t HHHH")
' hello there\n how are you?\nHHHH'
>>> 

正则表达式在此之上会有很多开销。

于 2012-11-03T13:28:39.897 回答
0
import os
#not just  for paths...

text = "  \t  hello there\n  \t  how are you?\n  \t HHHH"
li = text.split("\n")

common = os.path.commonprefix(li)

li = [i[len(common):] for i in li]
for i in li:
    print i

=>

 hello there
 how are you?
HHHH
于 2015-01-28T06:33:32.907 回答