python - 给a里面的句子编号
在 .xml 文件中？

Question

我是一名初级程序员，我遇到了这个可能很简单的问题：我想自动将数字添加到 .xml 文件的 P 标记中包含的句子中。因此 .xml 文件中的示例段落如下所示：

<P>Sentence1. Sentence2. Sentence3.</P>

我想将其转换为：

<P><SUP>1</SUP>Sentence1.<SUP>2</SUP> Sentence2.<SUP>3</SUP> Sentence3.</P>

然而，只有包含至少 2 个句子的 P 标签才应该编号，如果它只包含 1 个句子，我想保持不变。

到目前为止，这是我使用正则表达式提出的方法：

\.\s.*
# Reliably finds the second sentence, Insert <SUP>2</SUP> after it.
<P>[^>]*<SUP>2
# Finds the beginning of the first sentence if a second sentence exists.

但是我觉得这是一种非常尴尬的方法，我真的不知道如何扩展包含 20 个或更多句子的段落，或包含许多段落的 .xml 文档。是否有比正则表达式更好的正则表达式或更好的（Python）工具来实现这一点？

score 2 · Accepted Answer

像这样的东西（非常未经测试）可能会起作用

import xml.etree.ElementTree as ET
tree = ET.parse(XML_FILE)
root = tree.getroot()


for p in root.iter('p'):
   sentences = p.text.split('.')
   p.text = ".".join([("<sup>%i<sup>" % count) + sentence for count, sentence in enumerate(sentences)])

tree.write(XML_FILE)

python - 给a里面的句子编号在 .xml 文件中？

1 回答 1

Related

Reference

python - 给a里面的句子编号
在 .xml 文件中？