我正在将一本书从 PDF 转换为 calibre 的 epub。但标题不在标题标签内,因此尝试使用正则表达式替换它的 python 函数。
示例文本:
<p class="calibre1"><a id="p1"></a>Chapter 370: Slamming straight on</p>
<p class="softbreak"> </p>
<p class="calibre1">Hearing Yan Zhaoge’s suggestion, the Jade Sea City martial practitioners here were all stunned.</p>
<p class="calibre1"><a id="p7"></a>Chapter 372: Yan Zhaoge’s plan</p>
<p class="softbreak"> </p>
<p class="calibre1">Yan Zhaoge and Ah Hu sat on Pan-Pan’s back, black water swirling about Pan-Pan’s entire body, keeping away the seawater as he shot forward at lightning speed.</p>
我尝试使用正则表达式
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
pattern = r"</a>(?i)chapter [0-9]+: [\w\s]+(.*)<br>"
list = re.findall(pattern, match.group())
for x in list:
x = "</a>(?i)chapter [0-9]+: [\w\s]+(.?)<br>"
x = s.split("</a>", 1)[0] + '</a><h2>' + s.split("a>",1)[1]
x = s.split("<br>", 1)[0] + '</h2><br>' + s.split("<br>",1)[1]
return match.group()
和
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
pattern = r"</a>(?i)chapter [0-9]+: [\w\s]+(.*)<br>"
s.replace(re.match(pattern, s), r'<h2>$0')
但仍然没有得到预期的结果。我想要的是...
输入
</a>Chapter 370: Slamming straight on</p>
输出
</a><h2>Chapter 370: Slamming straight on</h2></p>
h2 标签将被添加到所有类似的实例中