0

我有几个 .xy 文件(2 列带有 x 和 y 值)。我一直在尝试读取所有这些并将“y”值粘贴到单个 excel 文件中(所有这些文件中的“x”值都相同)。到目前为止,我拥有的代码一个一个地读取文件,但速度极慢(每个文件大约需要 20 秒)。我有很多 .xy 文件,而且时间加起来相当多。我到目前为止的代码是:

import os,fnmatch,linecache,csv
from openpyxl import Workbook

wb = Workbook() 
ws = wb.worksheets[0]
ws.title = "Sheet1"


def batch_processing(file_name):
    row_count = sum(1 for row in csv.reader(open(file_name)))
    try:
        for row in xrange(1,row_count):

            data = linecache.getline(file_name, row)
            print data.strip().split()[1]   
            print data
            ws.cell("A"+str(row)).value = float(data.strip().split()[0])
            ws.cell("B"+str(row)).value = float(data.strip().split()[1])

        print file_name
        wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")
    except IndexError:
        pass


workingdir = "C:\Users\Mine\Desktop\P22_PC"
os.chdir(workingdir)
for root, dirnames, filenames in os.walk(workingdir):
    for file_name in fnmatch.filter(filenames, "*_Cs.xy"):
        batch_processing(file_name)

任何帮助表示赞赏。谢谢。

4

1 回答 1

2

我认为您的主要问题是您正在写入 Excel 并保存文件中的每一行,对于目录中的每个文件。我不确定将值实际写入 Excel 需要多长时间,但只是移出save循环并仅在添加所有内容后才保存应该会缩短一点时间。另外,这些文件有多大?如果它们很大,那么linecache可能是个好主意,但假设它们不是太大,那么你可能可以不用它。

def batch_processing(file_name):

    # Using 'with' is a better way to open files - it ensures they are
    # properly closed, etc. when you leave the code block
    with open(filename, 'rb') as f:
        reader = csv.reader(f)
        # row_count = sum(1 for row in csv.reader(open(file_name)))
        # ^^^You actually don't need to do this at all (though it is clever :)
        # You are using it now to govern the loop, but the more Pythonic way is
        # to do it as follows
        for line_no, line in enumerate(reader):
            # Split the line and create two variables that will hold val1 and val2
            val1, val2 = line
            print val1, val2 # You can also remove this - printing takes time too
            ws.cell("A"+str(line_no+1)).value = float(val1)
            ws.cell("B"+str(line_no+1)).value = float(val2)

    # Doing this here will save the file after you process an entire file.
    # You could save a bit more time and move this to after your walk statement - 
    # that way, you are only saving once after everything has completed
    wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")
于 2012-11-29T01:50:38.407 回答