问问题
1360 次
2 回答
5
我认为re
模块可能是矫枉过正。只需拆分内容\n
并删除空字符串。
>>> s = """This is the text
...
... I am interested in splitting,
...
...
... but I want to remove blank lines!"""
>>> lines = [l for l in s.split("\n") if l]
>>> lines
['This is the text', 'I am interested in splitting,', 'but I want to remove blank lines!']
似乎也快了string.split
两倍。
> python -m timeit -s 's = "This is the text\n\nthat I want to split\n\n\nand remove empty lines"; import re;' '[l for l in re.split(r"\n", s) if l]'
100000 loops, best of 3: 2.84 usec per loop
> python -m timeit -s 's = "This is the text\n\nthat I want to split\n\n\nand remove empty lines"' '[l for l in s.split("\n") if l]'
1000000 loops, best of 3: 1.08 usec per loop
于 2012-08-30T20:23:57.087 回答
1
标准拆分可以有多字符分隔符:
>>> '''1st para
... second line
...
... 2nd para
... '''.split('\n\n')
['1st para\nsecond line', '2nd para\n']
编辑
这是一个re.split
可以处理 Linux 和 Windows 样式的行尾,并处理段落之间的多个空行。
\n
测试:
>>> x='this is\na multiline\ntest\n\n2nd para\ngraph\n\n\n\nmore\nmore\nmore\n\n\n\n\nmore\n'
>>> import re
>>> re.split(r'(?:\r?\n){2,}',x)
['this is\na multiline\ntest', '2nd para\ngraph', 'more\nmore\nmore', 'more\n']
\r\n
测试:
>>> y=x.replace('\n','\r\n')
>>> re.split(r'(?:\r?\n){2,}',y)
['this is\r\na multiline\r\ntest', '2nd para\r\ngraph', 'more\r\nmore\r\nmore', 'more\r\n']
于 2012-08-30T21:05:08.330 回答