2

我有一个需要合并的多张工作表的 excel 文件。但是,列标题彼此不同。目前数据看起来是这样的。

Sheet 1
+-------------+--------------+----------+--------+---------+---------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header | Header1 | Header2 |
+-------------+--------------+----------+--------+---------+---------+
|          17 | Data         | Data     |      0 |       0 |       0 |
|          17 | Data         | Data     |      0 |       0 |       0 |
+-------------+--------------+----------+--------+---------+---------+

Sheet 2
+-------------+--------------+----------+---------+---------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header3 | Header2 |
+-------------+--------------+----------+---------+---------+
|          15 | Data         | Data     |       0 |       0 |
|          15 | Data         | Data     |       0 |       0 |
+-------------+--------------+----------+---------+---------+

Sheet 3
+-------------+--------------+----------+---------+---------+---------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header4 | Header1 | Header3 |
+-------------+--------------+----------+---------+---------+---------+
|          16 | Data         | Data     |       0 |       0 |       0 |
|          16 | Data         | Data     |       0 |       0 |       0 |
+-------------+--------------+----------+---------+---------+---------+

OUTPUT
+-------------+--------------+----------+--------+---------+---------+---------+---------+-----------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header | Header1 | Header2 | Header3 | Header4 | SheetName |
+-------------+--------------+----------+--------+---------+---------+---------+---------+-----------+
|          17 | Data         | Data     | 0      | 0       | 0       | null    | null    | Sheet1    |
|          17 | Data         | Data     | 0      | 0       | 0       | null    | null    | Sheet1    |
|          15 | Data         | Data     | null   | null    | 0       | 0       | null    | Sheet2    |
|          15 | Data         | Data     | null   | null    | 0       | 0       | null    | Sheet2    |
|          16 | Data         | Data     | null   | 0       | null    | 0       | 0       | Sheet3    |
|          16 | Data         | Data     | null   | 0       | null    | 0       | 0       | Sheet3    |
+-------------+--------------+----------+--------+---------+---------+---------+---------+-----------+

我对 Python 比较陌生。我用过 Pandas 和 numpy。我有多达 60 张纸要处理。谁能帮助我了解如何实现这一目标?如果不是python,我应该使用其他工具/方法吗?我真的可以从一个代码示例开始。

非常感谢您的帮助。先感谢您

4

3 回答 3

1

使用 R,这很容易做到。

library(openxlsx) # to read xlsx files
library(purrr)    # for the "map" function

wb <- loadWorkbook("path/filename.xlsx")
all_sheets <- names(wb)

merged_data <- map_df(all_sheets, ~ read.xlsx(wb, sheet = .x)
于 2018-04-15T21:21:33.880 回答
0
import pandas as pd

filepath = r"filePath here"
sheets_dict = pd.read_excel(filepath, sheet_name=None)

full_table = pd.DataFrame()

#loop through sheets
for name, sheet in sheets_dict.items():
    sheet['sheet'] = name
    #sheet = sheet.rename(columns=lambda x: x.split('\n')[-1])
    full_table = full_table.append (sheet)

full_table.reset_index (inplace=True, drop=True)

#Write to Excel
writer = pd.ExcelWriter('consolidated_TB1.xlsx', engine='xlsxwriter')
full_table.to_excel(writer,'Sheet1')

# Close the Pandas Excel writer and output the Excel file.
writer.save()
于 2018-04-16T09:12:56.797 回答
0

使用 for 循环和rbindR :

for (i in file.list) {
    data <- rbind(data, read.xlsx(i, sheetIndex = 1))
}

rbind用法:要垂直连接两个数据框(数据集),请使用 rbind 函数。两个数据帧必须具有相同的变量,但它们的顺序不必相同。

total <- rbind(data frameA, data frameB) 
于 2018-04-15T21:53:55.530 回答