0

这是我的功能:

     def prepare_file(time, mkt):
        # renames file to corresponding market name
        global previous_time
        for file in glob.glob(os.getcwd()+'\Reports\*'):
            # if it's the most recently downloaded file
            if time > previous_time:
                previous_time = time
                # remove rows for properties that have not changed status
                sheet = pyexcel.get_sheet(file_name=file)
                for row in sheet:
                    if row[1] in changed_addresses:
                        pass
                    else:
                        del row
                # save file as correct name
                sheet.save_as(
                    os.getcwd() + '\\Reports\\' + mkt[0] + '.csv'
                )
                os.remove(file)

这个想法是在一个目录中找到最近下载的文件,打开它,从changed_addresses列表中删除所有不包含地址的行,并将其保存为列表中包含的字符串mkt

除了删除行之外,一切正常。它正确地遍历它们,并了解何时应该删除一行,但输出的文件仍然包含所有应该删除的行。

del row不是这种情况下的正确命令?

4

2 回答 2

1

使用pyexcel,您需要使用以下语法:

del sheet.row[index] or del sheet.row[index1, index2, index3]

这是示例代码:

 def prepare_file(time, mkt):
    # renames file to corresponding market name
    global previous_time
    for file in glob.glob(os.getcwd()+'\Reports\*'):
        # if it's the most recently downloaded file
        if time > previous_time:
            previous_time = time
            # remove rows for properties that have not changed status
            sheet = pyexcel.get_sheet(file_name=file)
            indices_to_be_removed = [] # <-
            for index, row in enumerate(sheet):
                if row[1] in changed_addresses:
                    pass
                else:
                    indices_to_be_removed # <-
            # save file as correct name
            del sheet.row[indices_to_be_removed] # <-
            sheet.save_as(
                os.getcwd() + '\\Reports\\' + mkt[0] + '.csv'
            )
            os.remove(file)

或者,您可以编写一个过滤器,替代方法的优点是它可以处理具有自定义内存占用的巨大数据文件:

 def filter(file_name, changed_addresses):
     for row in pyexcel.iget_array(file_name=file_name):
         if row[1] in changed_addresses:
             yield row


 def prepare_file(time, mkt):
    # renames file to corresponding market name
    global previous_time
    for file in glob.glob(os.getcwd()+'\Reports\*'):
        # if it's the most recently downloaded file
        if time > previous_time:
            previous_time = time
            # remove rows for properties that have not changed status
            pyexcel.isave_as(array=filter(file, changed_addresses),
                             dest_file_name=os.getcwd() + '\\Reports\\' + mkt[0] + '.csv')
            os.remove(file)

但请记住在代码末尾调用。它将关闭所有 csv 文件句柄。

 pyexcel.free_resources()
于 2017-07-24T09:09:23.953 回答
1

Using csv I think this should work:

import csv
import os
import glob

def prepare_file(time, mkt):
   # renames file to corresponding market name
   global previous_time
   for file in glob.glob(os.getcwd()+'\Reports\*'):
       # if it's the most recently downloaded file
       if time > previous_time:
           previous_time = time
           # remove rows for properties that have not changed status
           fin = open(file, 'r')
           fout = open((os.getcwd() + '\\Reports\\' + mkt[0] + '.csv'), 'w', newline='')
           reader = csv.reader(fin)
           writer = csv.writer(fout)

           for row in reader:
               if row[1] not in changed_addresses:
                   writer.writerow(row)

           # close files
           fin.close()
           fout.close()

           # remove original
           os.remove(file)

So first opening your datafile with name file, and later saving it with the new name.

于 2017-07-07T21:04:11.447 回答