python - Python - 从文件中提取文本

Question

我的代码（到目前为止）：

ins = open( "log", "r" )
array = []
for line in ins:
    array.append( line )

for line in array:
    if "xyz" in line:
        print "xyz found!"
    else:
        print "xyz not found!"

日志文件示例：

Norman xyz Cat
Cat xyz Norman
Dog xyz Dog
etc. etc.

我目前拥有的 Python 脚本找到 xyz 并打印它找到了它。但我想做的不仅仅是找到 xyz。我想在 xyz 之前和 xyz 之后立即找到这个词。完成后，我希望能够存储（暂时，您的回复中不需要数据库）Norman 在“xyz”之前出现的次数以及 Norman 在“xyz”之后出现的次数（这适用于所有其他名称和动物也是如此）。

这纯粹是一个学习练习，所以如果你能包括你想出答案的“过程”，将不胜感激。我想知道如何像程序员一样思考，如果你愿意的话。这段代码的大部分只是我在谷歌上找到的东西，然后混合在一起，直到我得到了一些有用的东西。如果有更好的方法来写我目前拥有的东西，我也会很感激！

谢谢你的帮助！

score 4 · Accepted Answer

如果“单词”是指“空格分隔的标记”，则可以使用空格分隔行

x, key, y = line.split()

然后检查是否key == "xyz"，如果是，采取行动。

“采取行动”部分显然意味着“计算东西”，这就是collections.Counter目的。要计算之前和之后的东西xyz，请使用两个计数器：

from collections import Counter

before = Counter()
after = Counter()

for line in open("log"):
    x, key, y = line.split()
    if key == "xyz":
        # increment counts of x and y in their positions
        before[x] += 1
        after[y] += 1

# print some statistics
print("Before xyz we found:")
for key, val in before.iteritems():
    print("    %s %s" % (key, val))
# do the same for after

请注意，您当前的脚本会浪费大量时间和内存将文件读入 RAM，因此我也修复了该问题。要遍历文件的行，您不需要中间array变量。

score 0 · Accepted Answer

'abc'.split('b')将返回['a','c']因此考虑到这一点，我们可以像这样更改您的代码：

ins = open( "log", "r" )
array = []
prefixes = []
suffixes = []
for line in ins:
    array.append( line )

for line in array:
    if "xyz" in line:
            prefixes.append(line.split("xyz")[0])
            suffixes.append(line.split("xyz")[1])
    else:
        print "xyz not found!"

或者，如果我们只想计算某事在 xyz 之后或之前出现的所有时间，我们可以使用Counter

from collections import Counter
ins = open( "log", "r" )
array = []
prefixes = Counter()
suffixes = Counter()
for line in ins:
    array.append( line )

for line in array:
    if "xyz" in line:
            prefixes[line.split("xyz")[0]] += 1
            suffixes[line.split("xyz")[1]] += 1
    else:
        print "xyz not found!"
print "prefixes:" + str(prefixes)
print "suffixes:" + str(suffixes)

python - Python - 从文件中提取文本

2 回答 2

Related

Reference