0

我的任务是更改 docx 文件中的文本。所以我写了这段代码:

# -*- coding: utf-8 -*-

import docx
import os

def getText(from_filename, to_filename, old_value, new_value):
    doc = docx.Document(from_filename)
    for paragraph in doc.paragraphs:
        new_text = paragraph.text.replace(old_value, new_value)
        paragraph.text = new_text
    doc.save(to_filename)

if __name__ == '__main__':
    new_filename = 'result_from_python.docx'
    os.remove(new_filename)
    getText('USA.docx', new_filename, 'а', 'о')

问题是从结果中的源文档中删除粗体字体。无法理解如何修复它。

4

1 回答 1

1

您可以通过遍历run-level 实例来找到粗体元素。看我的小演示,尤其是注释掉的那一行:

import docx
import os


def getText(from_filename):
    doc = docx.Document(from_filename)
    for p in doc.paragraphs:
        for run in p.runs:
            print(run.text)
            print(run.bold)
            # run.bold = False  # This removes the style
            print('---')

        
getText('test.docx')

-level上可能有样式paragraph,我不确定它们是否在运行级别范围内看到。这是我的test.docx样子:

在此处输入图像描述

输出:

Lorem
None
---

None
---
ipsum
True
---

None
---
dolor
None
---
A
True
---
badsda
True
---

True
---
dw
True
---

True
---
alw
True
---
Asdfadsf
None
---

None
---
于 2021-11-16T09:24:14.453 回答