python docx库中是否有计算文档页数的功能?
2 回答
Not at the moment, but, unlike a way to tell where the page breaks are in the content, such a feature could be developed. At least if you were satisfied with whatever Word reported last time it saved the document.
This statistic is saved in the app.xml properties "part" by Word on each save. So if you were confident the document you were inspecting had last been saved by Word (or LibreOffice I expect would work too), then that method should be pretty reliable. If the document were generated by, say, python-docx, that statistic would be unreliable.
If this is a feature you're interested in, feel free to add it to the GitHub issues list: https://github.com/python-openxml/python-docx/issues
我想出了这个。适用于 pptx 和 docx 文件:
import zipfile
import re
archive = zipfile.ZipFile("myDocxOrPptxFile.docx", "r")
ms_data = archive.read("docProps/app.xml")
archive.close()
app_xml = ms_data.decode("utf-8")
regex = r"<(Pages|Slides)>(\d)</(Pages|Slides)>"
matches = re.findall(regex, app_xml, re.MULTILINE)
match = matches[0] if matches[0:] else [0, 0]
page_count = match[1]
print(page_count)
Office 格式只是其中包含 XML 内容的 zip 文件。您可以阅读这些文件的内容并随意解析它们。