我有超过 5000 个文本文件(也是 csv 格式),每个文件有几百行。
特定短语“城市”上方的所有内容都是不必要的,我需要它下方的所有内容,有没有办法(python 或批处理)删除所有内容?
我有超过 5000 个文本文件(也是 csv 格式),每个文件有几百行。
特定短语“城市”上方的所有内容都是不必要的,我需要它下方的所有内容,有没有办法(python 或批处理)删除所有内容?
我爱蟒蛇。但有时,sed
也很有用:
sed -n '/City/,$p' file_with_city > new_file_with_city_on_first_line
sed -i -n '/City/,$p' file1 file2 etc
Python中的模拟:
#!/usr/bin/env python
import fileinput
copy = False
for line in fileinput.input(inplace=True): # edit files inplace
if fileinput.isfirstline() or not copy: # reset `copy` flag for next file
copy = "City" in line
if copy:
print line, # copy line
用法:
$ ./remove-before-city.py file1 file2 etc
此解决方案会修改在命令行中给出的文件。
一种算法是这样的:
尽管可以截断文件以删除某个点之后的内容,但不能使用某个点之前的内容就地调整它们的大小。您可以通过反复搜索来使用单个文件来执行此操作,但这可能不值得。
如果文件足够小,您可以将第一个文件的整个内容读入内存,然后将您想要的部分写回同一个磁盘文件。
# Use a context manager to make sure the files are properly closed.
with open('in.csv', 'r') as infile, open('out.csv', 'w') as outfile:
# Read the file line by line...
for line in infile:
# until we have a match.
if "City" in line:
# Write the line containing "City" to the output.
# Comment this line out if you don't want to include it.
outfile.write(line)
# Read the rest of the input in one go and write it
# to the output. If you file is really big you might
# run out of memory doing this and have to break it
# into chunks.
outfile.write(infile.read())
# Our work here is done, quit the loop.
break
import os
for file in os.listdir("."):
infile = open(file, 'rb')
line = infile.readline()
# Sequential read is easy on memory if the file is huge.
while line != '' and not 'City' in line:
line = infile.readline() # skip all lines till 'City' line
# Process the rest of the file after 'City'
if 'City' in line:
print line # prints to stdout (or redirect to outfile)
while line != '' :
line = infile.readline()
print line
def removeContent(file, word, n=1, removeword=False):
with open(fname, "r") as file:
if removeword:
content = ''.join(file.read().split(word, n)[n])
else:
content = word + ''.join(file.read().split(word, n)[n])
with open(fname, "w") as file:
file.write(content)
for fname in filenames:
removeContent(fname)
参数说明:
n
告诉我们您希望它使用哪个单词来删除。默认情况下n = 1
,它会在第一次出现之前删除所有内容。要删除第五个之前的所有内容city
,请使用 调用该函数removeContent(fname, "city", 5)
。
file
明显代表你要编辑的文件名
word
是您要用于删除的词,在您的情况下是city
removeword
告诉是否保留这个词并且只删除它之前的文本,或者是否也删除这个词本身。