python - 使用 Python 遍历文本文件并将一组行存储在单独的数组中

Question

我有一个这样的文本文件--------

important unimportant
important unimportant
important unimportant
unimportant
unimportant
important unimportant
important unimportant   
important unimportant
unimportant
unimportant
important unimportant
important unimportant
important unimportant

在这个文本文件中，我只想提取“重要”部分，并将“重要”的三行一行存储在一个以逗号分隔的数组中。然后我想用前面提到的数组创建一个数组。

我对 Python 和与文本提取相关的包不是很熟悉。

我不知道如何解决这个问题。我真的很感激这方面的帮助。

score 0 · Accepted Answer

你没有分享很多，但很清楚的是：

你可以以某种方式区分一条重要的线和一条不重要的线；
您正在阅读文件的每一行
您希望将连续的“重要”结果组合在一起

循环文件：

with open('myfile.txt', 'r') as f:
    for line in f:
        # do something with `line`

您可以收集列表中重要的行，并且每当您到达不重要的行或文件末尾时，如果该列表中有行，则将其添加到结果中。

把所有东西放在一起：

def is_important(line):
    return 'important' in line.split()  # replace with an actual test


result = []
with open('myfile.txt', 'r') as f:
    important = []
    for line in f:
        if is_important(line):
            important.append(line)
        elif important:
            result.append(important)
            important = []
# done reading, add remaining important lines to result
if important:
    result.append(important)

print(result)

此代码适用于您的示例，只需更改is_important为真正有意义。

请注意，示例代码将在每行末尾包含换行符 - 有多种方法可以摆脱它，具体取决于您是要一次读取整个文件还是一次读取一行。自己应该不难弄清楚。

如果您正在寻找其中一种简短但难以阅读的解决方案：

from itertools import groupby


def is_important(line):
    return 'important' in line.split()  # replace with an actual test


result = [list(x) for c, x in groupby(open('myfile.txt', 'r').readlines(), lambda x: is_important(x)) if c]

print(result)

score 0 · Accepted Answer

AFAIU 尝试使用：

with open('file2.txt', 'r') as f:
    l = []
    c = 0
    s = []
    for line in f.readlines() + ['']:
        if 'important ' in line:
            c += 1
            s.append('important')
        else:
            l.append(', '.join(s))
            c = 0
            s.clear()
    print(list(filter(None, l)))

输出：

['important, important, important', 'important, important, important', 'important, important, important']

python - 使用 Python 遍历文本文件并将一组行存储在单独的数组中

2 回答 2

Related

Reference