python - 在目录中搜索特定的 Excel 文件，并将这些文件中的数据与输入值进行比较

Question

任务： 我得到一份暑期工作的公司有一个不断扩大的测试数据库，其中包含越来越多的每个项目的子文件夹，其中包括从 .jpeg 文件到我感兴趣的 .xlsx 的所有内容。就像我一样有点习惯了之前的 Python，我决定尝试一下这个任务。我想搜索将“测试电子表格”作为其标题一部分的 exceldocuments（例如“测试电子表格模型 259”）。我感兴趣的所有文档都是以相同的方式构建的（重量总是“A3”等），看起来有点像这样：

Model:             259 
Lenght:   meters    27
Weight:   kg      2500
Speed:    m/s       25

我希望完成程序的用户能够使用我的脚本将不同测试的结果相互比较。这意味着脚本必须查看是否存在同时满足两个条件的 x 值：

inputlength = x*length of model 259
inputweight = x*weight of model 259

程序应该循环遍历主文件夹中的所有文件。如果模型存在这样的 X，我希望程序将其返回到拟合模型列表。x 值将是一个变量，每个模型都不同。

结果，我想要一个适合输入的所有文件的列表，它们的比例（x 值）以及可能的文件链接。例如：

Model     scale   Link
ModelA    21.1    link_to_fileA
ModelB    0.78    link_to_fileB

脚本到目前为止，我尝试开始工作的脚本如下，但如果您对如何处理任务有其他建议，我会很乐意接受。不要害怕问我是否没有很好地解释任务。XLRD 已经安装，我使用 Eclipse 作为我的 IDE。我现在一直试图让它以多种方式工作，所以我的大部分脚本纯粹是为了测试。

编辑：

#-*- coding: utf-8 -*-
#Accepts norwegian letters

import xlrd, os, fnmatch

folder = 'C:\eclipse\TST-folder'



  def excelfiles(pattern):
    file_list = []
    for root, dirs, files in os.walk(start_dir):
        for filename in files:
            if fnmatch.fnmatch(filename.lower(), pattern):
                if filename.endswith(".xls") or filename.endswith(".xlsx") or filename.endswith(".xlsm"):
                    file_list.append(os.path.join(root, filename))
    return file_list

file_list = excelfiles('*tst*')     # only accept docs hwom title includes tst

print excelfiles()

为什么我在返回值后打印 excelfiles() 时只得到一个结果，但是当我将 "return os.path.join(filename)" 与 "print os.path.join(filename)" 交换时，它显示所有 . .xls 文件？这是否意味着不会传递 excelfiles-function 的结果？在评论中回答

''' Inputvals '''
inputweight = int(raw_input('legg inn vekt')) #inputbox for weight
inputlength = int(raw_input('legg inn lengd')) #inputbox for lenght
inputspeed = int(raw_input('legg inn hastighet')) #inputbox for speed

    '''Location of each val from the excel spreadsheet'''
    def locate_vals():
    val_dict = {}
    for filename in file_list:
        wb = xlrd.open_workbook(os.path.join(start_dir, filename))
        sheet = wb.sheet_by_index(0)

        weightvalue = sheet.cell_value(1, 1)
        lenghtvalue = sheet.cell_value(1, 1)
        speedvalue = sheet.cell_value(1, 1)

        val_dict[filename] = [weightvalue, lenghtvalue, speedvalue]

    return val_dict
val_dict = locate_vals()
print locate_vals()


count = 0

关于如何从 excelfiles-function 找到的每个文档中读取的任何想法？“funcdox”似乎不起作用。当我插入打印测试时，例如在 weightvalue = sheet.cell(3,3).value 函数之后打印 weightvalue，我根本没有得到任何反馈。没有提到的打印测试的错误消息：编辑到上面的脚本，它创建了一个不同值的列表 + 删除错误消息的微小更改

脚本运行良好，直到这一点

对下一部分做了一些小改动。它应该通过将电子表格中的值乘以常数 ( x1 ) 来缩放它。然后我希望用户能够定义另一个输入值，这反过来又定义了另一个常量（x2）以使电子表格值适合。最终，将比较这些常数以找出哪些模型实际上适合测试。

    '''Calculates vals from excel from the given dimensions'''

     def dimension():   # Maybe exchange exec-statement with the function itself.
    if count == 0:
        if inputweight != 0:
            exec scale_weight()
        elif inputlenght != 0:
            exec scale_lenght()
        elif inputspeed != 0:
            exec scale_speed()


def scale_weight(x1, x2):        # Repeat for each value.
    for weightvalue in locate_vals():
        if count == 0:
            x1 * weightvalue == inputweight
            count += 1
            exec criteria2
            return weightvalue, x1
        elif count == 2:
            inputweight2 = int(raw_input('Insert weight')) #inputbox for weight
            x2 * weightvalue == inputweight2
            return weightvalue, x2

x1 和 x2 是我想用这个函数找到的，所以我希望它们完全“免费”。有什么方法可以测试这个函数而不必插入 x1 和 x2 的值？

def scale_lenght():  # Almost identical to scale_weight
    return


def scale_speed():  # Almost identical to scale_weight
    return


def criteria2(weight, lenght, speed):
    if count == 1:
        k2 = raw_input('Criteria two, write weight, length or speed.')
        if k2 == weight:
            count += 1
            exec scale_weight
        elif k2 == lenght:
            count += 1
            exec scale_lenght
        elif k2 == speed:
            count += 1
            exec scale_speed
        else:
            return

你有什么更简单的方法来处理这个问题吗？（希望我能够解释得足够好。到目前为止我编写代码的方式非常混乱，但由于我没有那么有经验，所以我只需要制作它首先工作，然后如果我有时间就清理它。

由于可能没有一个值完全适合两个 x 常数，我想我会使用 approx_Equal 来处理它：

def approx_Equal(x1, x2, tolerance=int(raw_input('Insert tolerance for scaling difference')),err_msg='Unacceptable tolerance', verbose = True ):  # Gives the approximation for how close the two values of x must be for 
    if x1 == x2:
         x = x1+ (x2-x1)/2
         return x

最后，我想要一张所有使用的变量的图表 + 每个文档的链接到文件和名称。

不知道我将如何做到这一点，因此非常感谢任何提示。

谢谢！

score 0 · Accepted Answer

在回答第一个问题“为什么我在打印 excelfiles() 时只得到一个结果”时，这是因为您的 return 语句位于嵌套循环内，因此该函数将在第一次迭代时停止。我会尝试建立一个列表，然后返回这个列表，你也可以将它与检查名称的问题结合起来，例如：

import os, fnmatch

#globals
start_dir = os.getenv('md')

def excelfiles(pattern):
    file_list = []
    for root, dirs, files in os.walk(start_dir):
        for filename in files:
            if fnmatch.fnmatch(filename.lower(), pattern):
                if filename.endswith(".xls") or filename.endswith(".xlsx") or filename.endswith(".xlsm"):
                    file_list.append(os.path.join(root, filename))
    return file_list

file_list = excelfiles('*cd*')
for i in file_list: print i

显然，您需要将cd替换为您自己的搜索文本，但保留 * 两边并将 start_dir 替换为您自己的。我已经在 filename.lower() 上完成了匹配，并以小写形式输入了搜索文本以使匹配的大小写不敏感，如果您不想要，只需删除 .lower() 即可。我还允许其他类型的 Excel 文件。

关于从 Excel 文件中读取数据，我之前已经这样做了，以创建一种将基本 Excel 文件转换为 csv 格式的自动化方法。欢迎您查看下面的代码，看看是否有什么可以使用的。xl_to_csv 函数是从 Excel 文件中读取数据的地方：

import os, csv, sys, Tkinter, tkFileDialog as fd, xlrd

# stop tinker shell from opening as only needed for file dialog
root = Tkinter.Tk()
root.withdraw()

def format_date(dt):
    yyyy, mm, dd = str(dt[0]), str(dt[1]), str(dt[2])
    hh, mi, ss = str(dt[3]), str(dt[4]), str(dt[5])

    if len(mm) == 1:
        mm = '0'+mm
    if len(dd) == 1:
        dd = '0'+dd

    if hh == '0' and mi == '0' and ss == '0':
        datetime_str = dd+'/'+mm+'/'+yyyy
    else:
        if len(hh) == 1:
            hh = '0'+hh
        if len(mi) == 1:
            mi = '0'+mi
        if len(ss) == 1:
            ss = '0'+ss
        datetime_str = dd+'/'+mm+'/'+yyyy+' '+hh+':'+mi+':'+ss

    return datetime_str

def xl_to_csv(in_path, out_path):
    # set up vars to read file
    wb = xlrd.open_workbook(in_path)
    sh1 = wb.sheet_by_index(0)
    row_cnt, col_cnt = sh1.nrows, sh1.ncols

    # set up vars to write file
    fileout = open(out_path, 'wb')
    writer = csv.writer(fileout)

    # iterate through rows and cols
    for r in range(row_cnt):

        # make list from row data
        row = []
        for c in range(col_cnt):
            #print "...debug - sh1.cell(",r,c,").value set to:", sh1.cell(r,c).value
            #print "...debug - sh1.cell(",r,c,").ctype set to:", sh1.cell(r,c).ctype

            # check data type and make conversions
            val = sh1.cell(r,c).value
            if sh1.cell(r,c).ctype == 2: # number data type
                if val == int(val):
                    val = int(val) # convert to int if only no decimal other than .0
                #print "...debug - res 1 (float to str), val set to:", val
            elif sh1.cell(r,c).ctype == 3: # date fields
                dt = xlrd.xldate_as_tuple(val, 0) # date no from excel to dat obj
                val = format_date(dt)
                #print "...debug - res 2 (date to str), val set to:", val
            elif sh1.cell(r,c).ctype == 4: # boolean data types
                val = str(bool(val)) # convert 1 or 0 to bool true / false, then string
                #print "...debug - res 3 (bool to str), val set to:", val
            else:
                val = str(val)
                #print "...debug - else, val set to:", val

            row.append(val)
            #print ""

        # write row to csv file
        try:
            writer.writerow(row)
        except:
            print '...row failed in write to file:', row
            exc_type, exc_value, exc_traceback = sys.exc_info()
            lines = traceback.format_exception(exc_type, exc_value, exc_traceback)
            for line in lines:
                print '!!', line

    print 'Data written to:', out_path, '\n'

def main():
    in_path, out_path = None, None

    # set current working directory to user's my documents folder
    os.chdir(os.path.join(os.getenv('userprofile'),'documents'))

    # ask user for path to Excel file...
    while not in_path:
        print "Please select the excel file to read data from ..."
        try:
            in_path = fd.askopenfilename()
        except:
            print 'Error selecting file, please try again.\n'

    # get dir for output...
    same = raw_input("Do you want to write the output to the same directory? (Y/N): ")
    if same.upper() == 'Y':
        out_path = os.path.dirname(in_path)
    else:
        while not out_path:
            print "Please select a directory to write the csv file to ..."
            try:
                out_path = fd.askdirectory()
            except:
                print 'Error selecting file, please try again.\n'

    # get file name and join to dir
    f_name = os.path.basename(in_path)
    f_name = f_name[:f_name.find('.')]+'.csv'
    out_path = os.path.join(out_path,f_name)

    # get data from file and write to csv...
    print 'Attempting read data from', in_path
    print ' and write csv data to', out_path, '...\n'
    xl_to_csv(in_path, out_path)

    v_open = raw_input("Open file (Y/N):").upper()
    if v_open == 'Y':
        os.startfile(out_path)
    sys.exit()

if __name__ == '__main__':
    main()

如果您对此有任何疑问，请告诉我。

最后，关于输出，我会考虑将其以表格格式写入 html 文件。如果您需要任何帮助，请告诉我，我将提供更多示例代码，您可以使用其中的一部分。

更新

这是有关将输出写入 html 文件的一些进一步建议。这是我之前为此目的编写和使用的一个函数。让我知道您是否需要任何有关您需要更改实施的指导（如果有的话）。该函数需要数据参数中的嵌套对象，例如列表列表或元组列表等，但应该适用于任意数量的行/列：

def write_html_file(path, data, heads):
    html = []
    tab_attr = ' border="1" cellpadding="3" style="background-color:#FAFCFF; text-align:right"'
    head_attr = ' style="background-color:#C0CFE2"'

    # opening lines needed for html table
    try:
        html.append('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" ')
        html.append('"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> ')
        html.append('<html xmlns="http://www.w3.org/1999/xhtml">')
        html.append('<body>')
        html.append('  <table'+tab_attr+'>')
    except:
        print 'Error setting up html heading data'

    # html table headings (if required)
    if headings_on:
        try:
            html.append('    <tr'+head_attr+'>')
            for item in heads:
                html.append(' '*6+'<th>'+str(item)+'</th>')
            html.append('    </tr>')
        except:
            exc_type, exc_value, exc_traceback = sys.exc_info()
            lines = traceback.format_exception(exc_type, exc_value, exc_traceback)
            print 'Error writing html table headings:'
            print ''.join('!! ' + line for line in lines)

    # html table content
    try:
        for row in data:
            html.append('    <tr>')
            for item in row:
                html.append(' '*6+'<td>'+str(item)+'</td>')
            html.append('    </tr>')
    except:
        print 'Error writing body of html data'

    # closing lines needed
    try:
        html.append('  </table>')
        html.append('</body>')
        html.append('</html>')
    except:
        print 'Error closing html data'

    # write html data to file
    fileout = open(path, 'w')
    for line in html:
        fileout.write(line)

    print 'Data written to:', path, '\n'

    if sql_path:
        os.startfile(path)
    else:
        v_open = raw_input("Open file (Y/N):").upper()
        if v_open == 'Y':
            os.startfile(path)

headings_on 是我在脚本中设置为 True 的全局变量，您还需要导入 traceback 以使错误处理按照当前指定的方式工作。

python - 在目录中搜索特定的 Excel 文件，并将这些文件中的数据与输入值进行比较

1 回答 1

Related

Reference