10

您好,我想使用 python 连接三个 excels 文件 xlsx。

我曾尝试使用 openpyxl,但我不知道哪个函数可以帮助我将三个工作表附加到一个工作表中。

你有什么想法吗?

非常感谢

4

6 回答 6

27

这是一种基于pandas的方法。(它openpyxl在幕后使用。)

import pandas as pd

# filenames
excel_names = ["xlsx1.xlsx", "xlsx2.xlsx", "xlsx3.xlsx"]

# read them in
excels = [pd.ExcelFile(name) for name in excel_names]

# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels]

# delete the first row for all frames except the first
# i.e. remove the header row -- assumes it's the first
frames[1:] = [df[1:] for df in frames[1:]]

# concatenate them..
combined = pd.concat(frames)

# write it out
combined.to_excel("c.xlsx", header=False, index=False)
于 2013-04-03T18:49:32.720 回答
9

我会使用xlrdxlwt。假设您实际上只需要附加这些文件(而不是对它们进行任何实际工作),我会做类似的事情:打开一个要写入的文件xlwt,然后对于其他三个文件中的每一个,循环数据并将每一行添加到输出文件中。为了让你开始:

import xlwt
import xlrd

wkbk = xlwt.Workbook()
outsheet = wkbk.add_sheet('Sheet1')

xlsfiles = [r'C:\foo.xlsx', r'C:\bar.xlsx', r'C:\baz.xlsx']

outrow_idx = 0
for f in xlsfiles:
    # This is all untested; essentially just pseudocode for concept!
    insheet = xlrd.open_workbook(f).sheets()[0]
    for row_idx in xrange(insheet.nrows):
        for col_idx in xrange(insheet.ncols):
            outsheet.write(outrow_idx, col_idx, 
                           insheet.cell_value(row_idx, col_idx))
        outrow_idx += 1
wkbk.save(r'C:\combined.xls')

如果你的文件都有标题行,你可能不想重复,所以你可以修改上面的代码看起来更像这样:

firstfile = True # Is this the first sheet?
for f in xlsfiles:
    insheet = xlrd.open_workbook(f).sheets()[0]
    for row_idx in xrange(0 if firstfile else 1, insheet.nrows):
        pass # processing; etc
    firstfile = False # We're done with the first sheet.
于 2013-04-03T17:04:01.460 回答
6

当我结合 excel 文件(mydata1.xlsx、mydata2.xlsx、mydata3.xlsx)进行数据分析时,我会这样做:

import pandas as pd
import numpy as np
import glob

all_data = pd.DataFrame()
for f in glob.glob('myfolder/mydata*.xlsx'):
   df = pd.read_excel(f)
   all_data = all_data.append(df, ignore_index=True)

然后,当我想将其保存为一个文件时:

writer = pd.ExcelWriter('mycollected_data.xlsx', engine='xlsxwriter')
all_data.to_excel(writer, sheet_name='Sheet1')
writer.save()
于 2018-09-23T23:32:48.480 回答
3

openpyxl仅具有(没有一堆其他依赖项)的解决方案。

该脚本应该负责将任意数量的 xlsx 文档合并在一起,无论它们有一张还是多张。它将保留格式。

在 openpyxl 中有一个复制工作表的功能,但它只能来自/复制到同一个文件。某处还有一个函数 insert_rows ,但它本身不会插入任何行。因此,恐怕我们一次只能(乏味地)处理一个单元格。

尽管我不喜欢使用for循环并且宁愿使用像列表理解这样紧凑而优雅的东西,但我在这里看不到如何做到这一点,因为这是一个副作用展示。

归功于在工作簿之间复制的这个答案。

#!/usr/bin/env python3

#USAGE
#mergeXLSX.py <a bunch of .xlsx files> ... output.xlsx
#
#where output.xlsx is the unified file

#This works FROM/TO the xlsx format. Libreoffice might help to convert from xls.
#localc --headless  --convert-to xlsx somefile.xls

import sys
from copy import copy

from openpyxl import load_workbook,Workbook

def createNewWorkbook(manyWb):
    for wb in manyWb:
        for sheetName in wb.sheetnames:
            o = theOne.create_sheet(sheetName)
            safeTitle = o.title
            copySheet(wb[sheetName],theOne[safeTitle])

def copySheet(sourceSheet,newSheet):
    for row in sourceSheet.rows:
        for cell in row:
            newCell = newSheet.cell(row=cell.row, column=cell.col_idx,
                    value= cell.value)
            if cell.has_style:
                newCell.font = copy(cell.font)
                newCell.border = copy(cell.border)
                newCell.fill = copy(cell.fill)
                newCell.number_format = copy(cell.number_format)
                newCell.protection = copy(cell.protection)
                newCell.alignment = copy(cell.alignment)

filesInput = sys.argv[1:]
theOneFile = filesInput.pop(-1)
myfriends = [ load_workbook(f) for f in filesInput ]

#try this if you are bored
#myfriends = [ openpyxl.load_workbook(f) for k in range(200) for f in filesInput ]

theOne = Workbook()
del theOne['Sheet'] #We want our new book to be empty. Thanks.
createNewWorkbook(myfriends)
theOne.save(theOneFile)

使用 openpyxl 2.5.4、python 3.4 测试。

于 2018-06-19T15:11:05.320 回答
1

您可以简单地使用 pandas 和 os 库来执行此操作。

import pandas as pd
import os
#create an empty dataframe which will have all the combined data
mergedData = pd.DataFrame()
for files in os.listdir():
    #make sure you are only reading excel files
    if files.endswith('.xlsx'):
        data = pd.read_excel(files, index_col=None)
        mergedData = mergedData.append(data)
        #move the files to other folder so that it does not process multiple times
        os.rename(files, 'path to some other folder')

mergeData DF 将包含您可以导出到单独的 excel 或 csv 文件中的所有组合数据。相同的代码也适用于 csv 文件。只需在 IF 条件下替换它

于 2019-05-07T21:39:19.190 回答
0

只是为了添加到 p_barill 的答案,如果您有需要复制的自定义列宽,您可以将以下内容添加到 copySheet 的底部:

        for col in sourceSheet.column_dimensions:
            newSheet.column_dimensions[col] = sourceSheet.column_dimensions[col]

我只会在评论他或她的回答时发布这个,但我的声誉还不够高。

于 2019-05-14T15:33:54.573 回答