4

大家好,提前谢谢你们。

我有一个 python 脚本,我在其中打开一个模板 excel 文件,添加数据(同时保留样式)并再次保存。我希望能够在保存新的 xls 文件之前删除我没有编辑的行。我的模板 xls 文件有一个页脚,所以我想删除页脚之前的多余行。

这是我加载 xls 模板的方式:

self.inBook = xlrd.open_workbook(file_path, formatting_info=True)
self.outBook = xlutils.copy.copy(self.inBook)
self.outBookCopy = xlutils.copy.copy(self.inBook)

然后我将信息写入outBook,同时从outBookCopy 中获取样式并将其应用于我在outbook 中修改的每一行。

那么如何在写之前从 outBook 中删除行呢?感谢大家!

4

3 回答 3

3

我使用 Pandas 包实现了....

import pandas as pd

#Read from Excel
xl= pd.ExcelFile("test.xls")

#Parsing Excel Sheet to DataFrame
dfs = xl.parse(xl.sheet_names[0])

#Update DataFrame as per requirement
#(Here Removing the row from DataFrame having blank value in "Name" column)

dfs = dfs[dfs['Name'] != '']

#Updating the excel sheet with the updated DataFrame

dfs.to_excel("test.xls",sheet_name='Sheet1',index=False)
于 2016-01-06T05:53:34.700 回答
1

xlwt does not provide a simple interface for doing this, but I've had success with a somewhat similar problem (inserting multiple copies of a row into a copied workbook) by directly changing the worksheet's rows attribute and the row numbers on the row and cell objects.

The rows attribute is a dict, indexed on row number, so iterating a row range takes a little care and you can't slice it.

Given the number of rows you want to delete and the initial row number of the first row you want to keep, something like this might work:

rows_indices_to_move = range(first_kept_row, worksheet.last_used_row + 1)
max_used_row = 0
for row_index in rows_indices_to_move:
    new_row_number = row_index - number_to_delete
    if row_index in worksheet.rows():
        row = worksheet.rows[row_index]
        row._Row__idx = new_row_number
        for cell in row._Row__cells.values():
            if cell:
                cell.rowx = new_row_number
        worksheet.rows[new_row_number] = row
        max_used_row = new_row_number
    else:
        # There's no row in the block we're trying to slide up at this index, but there might be a row already present to clear out.
        if new_row_number in worksheet.rows():
            del worksheet.rows[new_row_number]
# now delete any remaining rows
del worksheet.rows[new_row_number + 1:]
# and update the internal marker for the last remaining row
if max_used_row:
    worksheet.last_used_row = max_used_row

I would believe that there are bugs in that code, it's untested and relies on direct manipulation of the underlying data structures, but it should show the general idea. Modify the row and cell objects and adjust the rows dictionary so that the indices are correct.

Do you have merged ranges in the rows you want to delete, or below them? If so you'll also need to run through the worksheet's merged_ranges attribute and update the rows for them. Also, if you have multiple groups of rows to delete you'll need to adjust this answer - this is specific to the case of having a block of rows to delete and shifting everything below up.

As a side note - I was able to write text to my worksheet and preserve the predefined style thus:

def write_with_style(ws, row, col, value):
    if ws.rows[row]._Row__cells[col]:
        old_xf_idx = ws.rows[row]._Row__cells[col].xf_idx
        ws.write(row, col, value)
        ws.rows[row]._Row__cells[col].xf_idx = old_xf_idx
    else:
        ws.write(row, col, value)

That might let you skip having two copies of your spreadsheet open at once.

于 2013-05-17T20:32:25.590 回答
0

对于我们这些仍然坚持使用xlrd/ xlwt/xlutils的人,您可以使用以下过滤器:

from xlutils.filter import BaseFilter

class RowFilter(BaseFilter):
    rows_to_exclude: "Iterable[int]"
    _next_output_row: int

    def __init__(
            self,
            rows_to_exclude: "Iterable[int]",
    ):
        self.rows_to_exclude = rows_to_exclude
        self._next_output_row = -1

    def _should_include_row(self, rdrowx):
        return rdrowx not in self.rows_to_exclude

    def row(self, rdrowx, wtrowx):
        if self._should_include_row(rdrowx):
            # Proceed with writing out the row to the output file
            self._next_output_row += 1
            self.next.row(
                rdrowx, self._next_output_row,
            )

    # After `row()` has been called, `cell()` is called for each cell of the row
    def cell(self, rdrowx, rdcolx, wtrowx, wtcolx):
        if self._should_include_row(rdrowx):
            self.next.cell(
                rdrowx, rdcolx, self._next_output_row, wtcolx,
            )

然后将其与例如:

from xlrd import open_workbook
from xlutils.filter import DirectoryWriter, XLRDReader

xlutils.filter.process(
    XLRDReader(open_workbook("input_filename.xls", "output_filename.xls")),
    RowFilter([3, 4, 5]),
    DirectoryWriter("output_dir"),
)
于 2020-12-01T17:58:19.560 回答