python - 如何删除文本文件中关键短语之前的所有内容？

Question

我有超过 5000 个文本文件（也是 csv 格式），每个文件有几百行。

特定短语“城市”上方的所有内容都是不必要的，我需要它下方的所有内容，有没有办法（python 或批处理）删除所有内容？

score 5 · Accepted Answer

我爱蟒蛇。但有时，sed也很有用：

sed -n '/City/,$p' file_with_city > new_file_with_city_on_first_line

score 2 · Accepted Answer

sed -i -n '/City/,$p' file1 file2 etcPython中的模拟：

#!/usr/bin/env python
import fileinput

copy = False
for line in fileinput.input(inplace=True): # edit files inplace
    if fileinput.isfirstline() or not copy: # reset `copy` flag for next file
       copy = "City" in line
    if copy:
       print line, # copy line

用法：

$ ./remove-before-city.py file1 file2 etc

此解决方案会修改在命令行中给出的文件。

score 2 · Accepted Answer

一种算法是这样的：

从文件中读取，直到遇到文本“City”
以写入模式打开第二个文件
从第一个文件流到第二个文件
关闭两个文件
将第二个文件移动到第一个文件先前占用的位置

尽管可以截断文件以删除某个点之后的内容，但不能使用某个点之前的内容就地调整它们的大小。您可以通过反复搜索来使用单个文件来执行此操作，但这可能不值得。

如果文件足够小，您可以将第一个文件的整个内容读入内存，然后将您想要的部分写回同一个磁盘文件。

score 1 · Accepted Answer

# Use a context manager to make sure the files are properly closed.
with open('in.csv', 'r') as infile, open('out.csv', 'w') as outfile:
    # Read the file line by line...
    for line in infile:
        # until we have a match.
        if "City" in line:
            # Write the line containing "City" to the output.
            # Comment this line out if you don't want to include it.
            outfile.write(line)

            # Read the rest of the input in one go and write it
            # to the output. If you file is really big you might
            # run out of memory doing this and have to break it
            # into chunks.
            outfile.write(infile.read())

            # Our work here is done, quit the loop.
            break

score 0 · Accepted Answer

import os

for file in os.listdir("."):
    infile = open(file, 'rb')
    line = infile.readline()
    # Sequential read is easy on memory if the file is huge.
    while line != '' and not 'City' in line:
        line = infile.readline()     # skip all lines till 'City' line
    # Process the rest of the file after 'City'
    if 'City' in line:
        print line     # prints to stdout (or redirect to outfile)
    while line != '' :
        line = infile.readline()
        print line

score 0 · Accepted Answer

def removeContent(file, word, n=1, removeword=False):
    with open(fname, "r") as file:
        if removeword:
            content = ''.join(file.read().split(word, n)[n])
        else:
            content = word + ''.join(file.read().split(word, n)[n])
    with open(fname, "w") as file:
        file.write(content)

for fname in filenames:
    removeContent(fname)

参数说明：

n告诉我们您希望它使用哪个单词来删除。默认情况下n = 1，它会在第一次出现之前删除所有内容。要删除第五个之前的所有内容city，请使用调用该函数removeContent(fname, "city", 5)。

file明显代表你要编辑的文件名

word是您要用于删除的词，在您的情况下是city

removeword告诉是否保留这个词并且只删除它之前的文本，或者是否也删除这个词本身。

python - 如何删除文本文件中关键短语之前的所有内容？

6 回答 6

Related

Reference