0

我正在使用 findall 来分隔文本。

我从这个表达式开始 re.findall(r'(. ?)(\$. ?\$)' 但在找到最后一段文本后它没有给我数据。我错过了 '6\n\n '

如何获得最后一段文字?

这是我的python代码:

#!/usr/bin/env python

import re

allData = '''
1
2
3 here Some text in here 
$file1.txt$
4 Some text in here and more  $file2.txt$
5 Some text $file3.txt$ here  
$file3.txt$
6

'''

for record in re.findall(r'(.*?)(\$.*?\$)|(.*?$)',allData,flags=re.DOTALL) :
print repr(record)

我得到的输出是:

('\n1\n2\n3 here Some text in here \n', '$file1.txt$', '')
('\n4 Some text in here and more  ', '$file2.txt$', '')
('\n5 Some text ', '$file3.txt$', '')
(' here  \n', '$file3.txt$', '')
('', '', '\n6\n')
('', '', '')
('', '', '')

我真的很想要这个输出:

('\n1\n2\n3 here Some text in here \n', '$file1.txt$')
('\n4 Some text in here and more  ', '$file2.txt$')
('\n5 Some text ', '$file3.txt$')
(' here  \n', '$file3.txt$')
('\n6\n', '', )

背景信息,以防您需要查看大图。

我想你有兴趣,我正在用 python 重写它。我控制了其余的代码。我只是从 findall 中得到了太多东西。

https://discussions.apple.com/message/21202021#21202021

4

4 回答 4

2

如果我从该 Apple 链接中理解正确,您想要执行以下操作:

import re


allData = '''
1
2
3 here Some text in here
$file1.txt$
4 Some text in here and more  $file2.txt$
5 Some text $file3.txt$ here
$file3.txt$
6

'''


def read_file(m):
    return open(m.group(1)).read()

# Sloppy matching :D
# print re.sub("\$(.*?)\$",  read_file, allData)
# More precise.
print re.sub("\$(file\d+?\.txt)\$",  read_file, allData)

编辑正如奥斯卡建议使匹配更精确。

IE。取$s 之间的文件名并读取文件中的数据,这就是上面要做的。

示例输出:

1
2
3 here Some text in here

I'am file1.txt

4 Some text in here and more  
I'am file2.txt

5 Some text 
I'am file3.txt
 here

I'am file3.txt

6

文件:

==> file1.txt <==

I'am file1.txt

==> file2.txt <==

I'am file2.txt

==> file3.txt <==

I'am file3.txt
于 2013-02-26T20:50:37.080 回答
1

要获得您想要的输出,您需要将模式限制为 2 个捕获组。(如果您使用 3 个捕获组,则每个“记录”中将有 3 个元素)。

您可以将第二组设为可选,这应该可以完成工作:

r'([^$]*)(\$.*?\$)?'
于 2013-02-26T20:56:57.090 回答
1

这是解决替换问题的一种方法findall

def readfile(name):
    with open(name) as f:
        return f.read()

r = re.compile(r"\$(.+?)\$|(\$|[^$]+)")

print "".join(readfile(filename) if filename else text 
    for filename, text in r.findall(allData))
于 2013-02-26T21:46:56.573 回答
0

这个部分解决了你的问题

import re

allData = '''
1
2
3 here Some text in here 
$file1.txt$
4 Some text in here and more  $file2.txt$
5 Some text $file3.txt$ here  
$file3.txt$
6

'''

for record in re.findall(r'(.*?)(\$.*?\$)|(.*?$)',allData.strip(),flags=re.DOTALL) :
    print  [ x for x in record if x]

生产输出

['1\n2\n3 here Some text in here \n', '$file1.txt$']
['\n4 Some text in here and more  ', '$file2.txt$']
['\n5 Some text ', '$file3.txt$']
[' here  \n', '$file3.txt$']
['\n6']
[]

避免最后一个空列表

for record in re.findall(r'(.*?)(\$.*?\$)|(.*?$)',allData.strip(),flags=re.DOTALL) :
    if ([ x for x in record if x] != []):
        print  [ x for x in record if x]
于 2013-02-26T20:56:20.533 回答